This week marks the
start of TechNet on Tour, coming to twelve cities. The full day workshops include lecture and
hands-on-labs where you can learn about some of the ways you can utilize Microsoft
Azure to help with your disaster recovery planning.
But let me tell you
about the first "business continuity" plan I was part of. It involved a stash of tapes, daily backups
on a two week cycle with the Friday backups being held for a month. The nightly backup job fit on two tapes and
every morning, I ejected the tapes from the machine and dropped them in my
bag. They went home with me, across
town, and came back every day to be swapped with latest ones. Whenever I took a vacation, I designated an available person to perform the same task.
That was it. The tapes were rarely looked at, the data
never tested and fortunately, never needed.
We were partying like it was 1999. Because it was.
Still, the scenario
isn't uncommon. There are still lots of
small businesses, with only single locations and still lots of tapes out there. But now, there is more data and more urgency for that data to be recovered as quickly as possible with as
little loss as possible. And there are
still only 24 hours in the day. How annoying to arrive at work in the morning,
only to find the overnight backup job still running.
As I moved through
jobs and technologies evolved, we addressed the growing data and lack of time
in many ways… Adjusting backup jobs to
capture less critical or infrequently changing data only over the weekends. More jobs that only captured delta
changes. Fancier multiple-tape changers,
higher density tapes, local "disk to disk" backups that were later
moved to tape, even early "Internet" backup solutions, often offered
by the same companies that handled your physical tape and box rotation
services.
We also chased that
holy-grail of "uptime".
Failures weren't supposed to happen if you threw enough hardware in a
room. Dual power supplies, redundant
disk arrays, multiple disk controllers, UPS systems with various bypass
offerings. Add more layers to protect
the computers, the data.
Testing was
something we wanted to do more often.
But it was hard justify additional hardware purchases to upper management.
Hard to find the time to set up a comprehensive test. But we tried and often failed. And learned.
Because each test or real outage is a great opportunity to learn. Outages are often perfect storms… if only we
had swapped out that dying drive a day before, if only that piece of hardware
was better labeled, if only that was better documented… and each time we made
improvements.
I remember, after a
lengthy call with a co-location facility that wanted us to sign a year
agreement even though we only wanted space for 3 months to run a recovery test,
how I wished for something I could just use for the time I needed. It's been a little over 5 years since that phone call, but
finally there is an answer and it's "the cloud".
Is there failure in
the cloud? Of course, it's inevitable. For all the abstractness, it's still just running on hardware. But the cloud provides part of an answer
that many businesses simply didn't have even five years ago. Business that never recovered from the likes
of Katrina and other natural or man-made disasters, might still have a shot
today.
So catch a TechNet
Tour if it passes through your area.
Look at taking advantage of things like using the cloud as target
instead of tape, or replicating a VM to Azure with Azure Site Recovery. Even starting to dabble in better documentation or scripting with
PowerShell to make your key systems more consistently reproducible will go a
long way. Do a "table top" dry
run of your existing DR plan today.
Sysadmins don't let
other sysadmins drop DLT tapes in their
bags. Let's party like it's 2015. Because it is.
No comments:
Post a Comment