SQL Server Availability Groups have been around for a while. But not a lot of organizations have implemented them yet mainly because they haven’t upgraded their database platforms to SQL Server 2012/2014. As more and more organizations move away from older versions and into the newer ones, Availability Groups will become much more common. And with the Basic Availability Groups feature coming up in SQL Server 2016 Standard Edition, it will eventually become a viable replacement for the already deprecated Database Mirroring feature.
Dealing with SQL Server Availability Groups when the replicas/cluster nodes are in the same data center in the context of the network architecture is pretty straightforward. This applies to any cluster resources running on a Windows Server Failover Cluster (WSFC) like SQL Server failover clustered instances or even Microsoft Exchange Server. That’s because all we have to do as SQL Server administrators is convince the network engineers to make it work – unless we are the network engineers ourselves. Within the same data center, all of the replicas/cluster nodes will be in the same network subnet because that’s the easiest way to make the network engineers’ job a lot easier. But when you stretch your Availability Group to a different network subnet – for example, a disaster recovery (DR) site – you will have to deal with a whole new set of administrative challenges.
Below is the first of the five things that you need to consider when deploying SQL Server Availability Groups in a DR site. Note that these apply to any cluster applications supported to run on a WSFC.
WSFC cluster nodes running Windows Server 2008/2012 still need to be joined to the same Active Directory Domain Services (ADDS) domain, regardless of whether or not you are creating a cluster name object (CNO.) I wrote about this in a previous blog post where I highlighted the importance of ADDS in maintaining high availability in SQL Server deployments. Since the Availability Group Listener Name is created as a client access point in the WSFC, it will create a corresponding virtual computer object (VCO) in ADDS. Note that you have the option to create an Active Directory-detached WSFC with Windows Server 2012 R2 but that also means being limited to using mixed mode authentication in your SQL Server instance.
Now, even when your WSFC cluster nodes reside within the same data center, you still run the risk of having any of the available ADDS domain controllers offline at the same time – power failure, regularly scheduled downtime, network issue, etc. This is exactly how I found out the direct dependency of a WSFC in ADDS. More so when you stretch your WSFC across different data centers.
Active Directory Domain Services Replication
When you stretch out your WSFC across data centers, all of the nodes still need to authenticate to the ADDS domain controller regardless of where they are. This is why it is common to see an ADDS domain controller on a DR data center so that all of the servers in that data center need not go across the network and into the production data center to authenticate. The servers in the DR data center will be configured to authenticate to the ADDS domain controller closest to them. This means that, because all of the servers on both the production and DR data centers are members of the same ADDS domain, the domain controller on the DR data center need to contain the exact same ADDS database as that of the production domain controller.
Because the ADDS database is basically that – a database – it has to be replicated to other domain controllers for them to have a copy of the database. This replication is configured by none other than your ADDS administrator. Can you guess what the default value of the replication frequency is between sites? 3 hours.
That means it will take 3 hours between the time you create your Availability Group Listener Name on the production data center to get replicated on the DR data center’s domain controller. Luckily, this only needs to happen once since you won’t be modifying the Availability Group Listener Name on a regular basis. It’s not like your Active Directory domain account where the password changes every 42 days that you end up writing it on a Post-It note stuck on your computer monitor.
What If It’s Not There?
Let’s say that, after the WSFC and Availability Group Listener Name were created, something happened to the ADDS replication. I’ve resolved cases in the past where there were conflicts in the ADDS database that replication to the other domain controllers was stalled. Much like replication in SQL Server where the databases can get out-of-sync, this can happen to the ADDS database. Which also means that the WSFC cluster name object (CNO) and the Availability Group Listener Name VCO won’t be available in the DR domain controller. When you attempt a failover to the DR site, the WSFC – depending on the version of Windows Server that you’re running – may not even come online. That’s because the WSFC CNO will try to authenticate to the domain controller but the account is not there yet. Much like you getting an error message that your account cannot log in to the domain.
This will definitely impact your recovery time objective (RTO) because of the inability to bring the WSFC and all of the other clustered resources online. The SQL Server instances running on the WSFC nodes will still be online because they are using a different ADDS domain account as a service account and the credentials cached.
Be Nice To The AD Guys
I’ve mentioned this as well in my previous blog post. While it may not be your job to take care of ADDS, your job as a SQL Server administrator is now very much dependent on them. Buy them coffee or take them out for lunch. You’ll be in a much better shape to ask questions about ADDS replication and whether or not the domain controllers are healthy on both your production and DR data centers. Ask about the ADDS replication frequency and see if it will meet your overall RTO.
A Word On Windows Server 2016
With Windows Server 2016, the one feature to try out is the ability to create a WSFC even if the cluster nodes are not joined to a domain. This gives us the flexibility of building a highly available WSFC without the dependencies on ADDS, taking the Active Directory-detached WSFC feature up a notch. Which also means not having to worry about ADDS replication, ADDS authentication, etc. But that doesn’t mean you no longer have to be nice to your ADDS guys 🙂
- Why We Need To Understand How Active Directory Affects SQL Server High Availability
- Failover Clustering and Active Directory Integration
- What’s New in Failover Clustering in Windows Server Technical Preview
- Training: SQL Server Always On Availability Group: The Senior DBA’s Ultimate Field Guide