Saturday, August 22, 2009

Disaster Recovery - But for Real

This past week I've been doing the preliminary work (installing servers mostly) to get ready for our scheduled disaster recovery test. I expect that we'll learn a lot about our existing plan and systems documentation and will be looking to make some changes that will make any need for a large recovery faster and more effective.

Meanwhile, I'm managing some real disaster recovery, but on a smaller scale. A few weeks ago I posted about the need to upgrade our ImageRight installation to resolve a bug that could cause some data loss. The ImageRight support staff worked hard to run the preliminary discovery/fixing of the image files and database, followed by performing the upgrade.

Not long after, I got an email from someone in another department asking me to "find" the annotations added to a invoice that seemed to have gone missing. She was assuming that since some temporary help had worked on the document, a user error had been made and a "copy without annotations" had been introduced. I could recover the annotations by looking through the deleted pages and at previous versions of those pages.

However, what I found was a bit unexpected. I found a history of changes being made to the document, but no actual annotations visible. Curious.

So I opened a support ticket. After several remote sessions and research, the ImageRight team was "nearly positive" (they need more testing to confirm) that the process run before our last upgrade to correct the potential data loss, actually introduced a different kind of data loss. The result is that the database knows about the affected annotations happening, but the physical files that represent the annotated versions had been replaced with non-annotated versions.

We do have the logs from the original process, so it was just a matter of ImageRight Support parsing that data to generate a list of files that were changed. Now we begin the task of recovering those files from tape.

Our Sr. DBA had been working on side project that loads all our backup catalogs into a database so we have a comprehensive reference from all backup servers to identify what tapes to recall when people ask for recoveries. That project is proving its worth this time around, since we need to locate and restore over 1000 files. He also needs to cross referencing them to the individual documents accessible via the desktop client so we can do a visual comparison to any special cases and to provide a record of which documents were affected in a format that's understandable to everyone else, in case additional concerns come up after we repair the damage.

Our current plan is to have this resolved by the end of next weekend, but this is something that needs careful handling since we don't want end users to have any doubt about the integrity of the system, which I still have total confidence in once we sort this out. Thus, I'm happy to spend the extra time making sure no other issues are introduced.

Plus I need some time to find our DBA some really nice coffee for his efforts.

No comments:

Post a Comment

MS ITPro Evangelists Blogs

More Great Blogs