It seems inevitable that while restoring Active Directory in a disaster recovery scenario, one is going to feel rushed. Even with this being a test environment, I felt like getting AD back was something that needed to be quick so we could move onto the more user-facing applications, like Exchange.
My network has two active directory domains, a parent and a child domain in a single forest. The design is no longer appropriate for how things are organized for our company and we've been slowly working to migrate servers and services to the root domain. Right now, we are down to the remaining 3 servers in our child domain and one remaining service account. The end is in sight, but I digress.
My network has two active directory domains, a parent and a child domain in a single forest. The design is no longer appropriate for how things are organized for our company and we've been slowly working to migrate servers and services to the root domain. Right now, we are down to the remaining 3 servers in our child domain and one remaining service account. The end is in sight, but I digress.
The scope of our disaster recovery test does not involve restoring that child domain. This is becoming an interesting exercise, because it will force us to address how to get those few services that reside in that domain working in the DR lab. This will also help us when we plan the process for moving those services in production.
Bringing back a domain controller for my root domain went by the book. I could explain away all of the random error messages, as they all were related to this domain controller being unable to replicate to other DCs, as they hadn't been restored. I had recovered the DC that held the majority of the FSMO roles and sized the others. I started moving onto other tasks, but I couldn't get past the errors about this domain controller being unable to find a global catalog. All the domain controllers in our infrastructure are global catalogs, including this one, as I hadn't made a change to the NTDS settings once it was restored.
So I took the "tickle it" approach and unchecked/rechecked the Global Catalog option. The newly restored DC successfully relinquished its GC role and then refused to complete the process to regain the role again. It was determined to verify this status with the other domain controllers it knew about, but couldn't contact.
I knew for this exercise, I wasn't bringing back any other domain controllers. And in reality, even if I was going to need additional DCs, it was far easier (and less error-prone) to just promote new machines than to bother restoring every DC in our infrastructure from tape. (However I still back up all my domain controllers, just to be prepared.)
To solve the issue, I turned to metadata cleanup. Using NTDSUTIL, I removed the references to the other DC for root domain, the DC for the child domain and finally, the lingering and now orphaned child domain itself. I also had to go into "AD Domains and Trusts" to delete the trust to the child domain, which wasn't removed when the metadata was deleted. Once all these references were removed, the domain controller successfully was able to assume the global catalog role and I could comfortably move on to restoring our Exchange server.
And I've learned that just because I can explain an error, doesn't mean I can ignore it.
Bringing back a domain controller for my root domain went by the book. I could explain away all of the random error messages, as they all were related to this domain controller being unable to replicate to other DCs, as they hadn't been restored. I had recovered the DC that held the majority of the FSMO roles and sized the others. I started moving onto other tasks, but I couldn't get past the errors about this domain controller being unable to find a global catalog. All the domain controllers in our infrastructure are global catalogs, including this one, as I hadn't made a change to the NTDS settings once it was restored.
So I took the "tickle it" approach and unchecked/rechecked the Global Catalog option. The newly restored DC successfully relinquished its GC role and then refused to complete the process to regain the role again. It was determined to verify this status with the other domain controllers it knew about, but couldn't contact.
I knew for this exercise, I wasn't bringing back any other domain controllers. And in reality, even if I was going to need additional DCs, it was far easier (and less error-prone) to just promote new machines than to bother restoring every DC in our infrastructure from tape. (However I still back up all my domain controllers, just to be prepared.)
To solve the issue, I turned to metadata cleanup. Using NTDSUTIL, I removed the references to the other DC for root domain, the DC for the child domain and finally, the lingering and now orphaned child domain itself. I also had to go into "AD Domains and Trusts" to delete the trust to the child domain, which wasn't removed when the metadata was deleted. Once all these references were removed, the domain controller successfully was able to assume the global catalog role and I could comfortably move on to restoring our Exchange server.
And I've learned that just because I can explain an error, doesn't mean I can ignore it.
No comments:
Post a Comment