SQL Server Availability Groups have been around for a while. A lot of organizations are still in the process of implementing them as they upgrade their database platforms to the latest version of SQL Server. As more and more organizations move away from older versions and into the newer ones, Availability Groups will become much more common. And with the Basic Availability Groups feature made available starting with SQL Server 2016 Standard Edition, it will eventually become a viable replacement for the already deprecated Database Mirroring feature.
In the first installation in this series, I talked about how Active Directory Domain Services (ADDS) authentication and replication affect a SQL Server Availability Group implementation in a disaster recovery (DR) site. As a follow-up to that blog post, here’s another thing that you need to consider: Domain Name System or DNS. The reality is that client applications need to know where the database server is in order to properly connect to it. Since computers locate each other via IP addresses, we humans need a way to translate those IP addresses in a more simple way. Think names-for-codes.
Active Directory-integrated DNS Zones
When an Active Directory domain controller is initially setup and configured, an administrator has the option to configure DNS along side it. This makes it easy to store DNS zones inside ADDS. As a side-effect, when the ADDS database gets replicated across all of the domain controllers, the data in the DNS zones gets replicated with it. This eliminates the need for a separate replication mechanism for the DNS zone records.
Now, we’ve already covered the default replication frequency of ADDS and how that affects authentication of client computers and user accounts in a DR site. We don’t really have to worry about the Availability Group Listener Name getting replicated to the domain controller on the DR site so long as it does get properly replicated. Besides, I doubt that you will keep changing your Availability Group Listener Name every day.
More Than One Virtual IP Address
The key thing to remember with the ADDS replication is that you only deal with one record associated with the Availability Group Listener Name. But what about the corresponding DNS record? In a multi-subnet failover cluster much like when you stretch your cluster between your production and DR sites, you will have more than one virtual IP address. That’s mainly because your network administrators hate the idea of the additional overhead associated with managing a stretched VLAN. Windows Server 2008 introduced support for multi-site (also called multi-subnet or geographically dispersed) failover clusters – the ability to have multiple virtual IP addresses assigned to a failover cluster and all the cluster resources running on top of it. The number of virtual IP addresses that can be assigned to a virtual network name can be as many as the number of network subnets that the failover cluster will stretch over to.
The default behavior of the virtual network name is to register the DNS entry associated with the network subnet that is currently hosting the resource. Let’s use some examples to illustrate this point for a multi-subnet virtual name resource. I’m using this screenshot from the article I wrote on MSSQLTips a few years back for installing SQL Server 2012 failover clustered instance (same concept applies to an Availability Group Listener Name.)
Notice the IP addresses available for configuration – 172.16.0.113 and 192.168.0.113.
In Windows Server 2008 and Windows Server 2008 R2, after the configuration, when the 172.16.0.113 virtual IP address is initially brought online, it will register the IP-to-virtual name mapping to the nearest DNS server available to it. Even when the DNS zone gets replicated to the DR site, only the 172.16.0.113 virtual IP address will be registered to all of the DNS servers in the network. The only time that the 192.168.0.113 virtual IP address will be registered on the DNS server (and eventually get replicated to all of the DNS servers on the network) is when it is brought online on the same network subnet as the node hosting it.
With Windows Server 2012 and higher, the behavior is quite different. Both virtual IP addresses will be registered to the nearest DNS server regardless of which one is online. The DNS replication process will take care of replicating the DNS records to all of the DNS servers in the network, including the ones in the DR site.
When The Client Application Gets Lost
Remember what I said about client applications communicating to the database server thru the Availability Group Listener name using IP addresses? What if that IP address is not available when the client application connects to the database server? It’s like telling your friend to go to a specific store to make daily deliveries but when he gets there, it’s in the middle of nowhere. That’s basically how the client application behaves. Because the DNS server only gave it one virtual IP address instead of all the available virtual IP addresses for the same virtual network name, the client application has no idea that there is another option to find what it is looking for. From a user experience point of view, the application – including the database that you made sure was highly available – is technically offline.
Give The Client Application Another Address
There’s a way to make sure that the client application won’t get lost when it tries to find the the database server: give it all of the available virtual IP addresses. The way to do that is to make sure that all of the virtual IP addresses for a specific Availability Group Listener Name are registered on the DNS servers. Depending on the version of the Windows Server operating system that you’re using for your cluster, this behavior might be different. There is a specific property of the virtual server name resource running on the Windows Server Failover Cluster called RegisterAllProvidersIP. This property determines whether or not all of the virtual IP addresses of a specific virtual server name resource are registered or not; a value of 1 means that it all virtual IP addresses will be registered on the DNS server and a value of 0 means only the one that is currently online will be registered. Unfortunately, this property can only be modified using Windows PowerShell. Below is a sample code to change this property.
Get-ClusterResource 'Name of your Availability Group Listener Name' | Set-ClusterParameter RegisterAllProvidersIP 1
By setting this property value to 1, all of the virtual IP addresses of the Availability Group Listener Name will be registered on the nearest DNS server which will, then, get replicated to all of the other DNS servers in your network.
Make The Client Application Decide Faster
Now, even if the client application has all of the available virtual IP addresses for the Availability Group Listener Name, it won’t just go ahead and immediately connect to the one that is available. There’s a reason for this. Imagine the friend that you asked to go to a specific store to make daily deliveries. You certainly don’t want him to keep asking you every minute where the store is. You give him the address once and he will have to use that information every time he needs to go to the store. And maybe, just maybe, he would come to you every last Friday of the month to make sure that nothing has changed in the address.
From a client application’s point of view, this is called the DNS client time-to-live (TTL) value. By default, this is set to 20 minutes – this is if the DNS Service registers records for its own zones and the same as when the Windows Server Failover Cluster registers the Availability Group Listener Name (it’s 15 minutes if you used DHCP-assigned IP address for the Availability Group Listener Name.) This means that the client application will have to use that same information for 20 minutes before asking the DNS server again for new information about the IP-to-virtual name mapping.
What if immediately after the client application requests the DNS server for the new IP-to-virtual name mapping, the Availability Group failed over to the DR site? That would mean waiting for another 20 minutes before the client application even finds out that there has been a change in the virtual IP address. That would also mean that your recovery time objective (RTO) should be greater than 20 minutes? Would that meet your service level agreements?
Good thing there is a way to change this default behavior. Similar to the RegisterAllProvidersIP property, the virtual server name resource also has another property called HostRecordTTL. With this property, you can change the DNS client TTL value (in seconds) to something lower than 20 minutes. Below is a sample PowerShell code to change this property.
Get-ClusterResource 'Name of your Availability Group Listener Name' | Set-ClusterParameter HostRecordTTL 300
Changing this property means that all client applications will have to ask the DNS server every 5 minutes to check for any changes in the e IP-to-virtual name mapping. But you have to be very careful with this. Be sure to ask your network administrators about the network traffic workload. If you only have a server farm of less than 10 servers connecting to the Availability Group, this wouldn’t be a big of a deal on the network. But what if you have thousands of client applications all connecting to the Availability Group thru the Listener Name? Your network will immediately experience severe performance due to massive requests from DNS clients.
Make The Client Applications Smarter
Wouldn’t it be nice if you can get your friend to decide without asking you? If he couldn’t find the first store that you gave him, should he be going immediately to the next store in the list? Since the client application already has all of the virtual IP addresses for the Availability Group Listener Name, there’s no sense in waiting for the DNS client TTL value to expire. We’ve already exhausted all of our options in the network to make sure that the client applications can connect to the Availability Group. This is where we get the application developers involved to update the applications.
SQL Server has available drivers for the .NET Framework, ODBC, JDBC, etc. to handle this particular behavior. The client drivers support the connection string property MultiSubnetFailover=True. This connection string property provides faster detection of and connection to the currently active Availability Group Listener Name so that your client applications don’t have to rely on the DNS client TTL value to ask for the DNS server. This will certainly reduce the amount of downtime from the client application’s point of view when the Availability Group is brought online on the DR side.
Evaluate your existing SQL Server Availability Groups deployment and see if these configuration settings have been considered. If you cannot change the client application right away, talk to the network and systems administrators to have the HostRecordTTL and RegisterAllProvidersIP properties modified accordingly. You don’t want to be caught off guard when your Availability Group is brought online on the DR site.
- Active Directory-Integrated DNS Zones
- Requirements and Recommendations for a Multi-Site Failover Cluster
- DNS Processes and Interactions
- SQL Server Native Client Support for High Availability, Disaster Recovery
- SqlConnectionStringBuilder.MultiSubnetFailover Property