About four years ago, I did my very first webcast on the Availability Groups feature in SQL Server 2012. The premise of the presentation was how we can provide high availability features to existing SQL Server workloads without making changes to the underlying storage. For my demo, I used an existing database mirroring configuration for high availability with log shipping for disaster recovery as my Availability Group replicas. One of the features that I highlighted was read-only routing for readable secondary replicas.
One of the most common question that I get from customers who have already implemented Availability Groups with readable secondary replicas is the ability to load balance the incoming read-only workload across multiple replicas. When I get asked this question, I assume the following:
- 1) They have an Enterprise Edition license for all of the readable secondary replicas, including the primary, of course
- 2) They will update their applications to use the latest SQL Server Native Client
- 3) They will include the Application Intent=ReadOnly property in their application connection string
If you take a look at the SQL Server Books Online documentation for implementing read-only routing, you can see the note below, emphasis mine.
Indeed, as of SQL Server 2014, load balancing the read-only workloads across all available readable secondary replicas is not supported. Customers would end up implementing their own custom solution like having a hardware load balancer like the F5 Big IP LTM in front of the Availability Group to handle the read-only workload. They would work really well especially if your goal is to just balance the read-only workloads. The only challenge here is that this layer is managed by a different team – the network engineering team. It means that HA/DR processes need to be modified to include the complexities of this layer. It also means that the communication also needs to be well defined for escalation.
It’s why customers have asked for a better option: why not have this feature available in the product?
An Out-of-the-Box Feature
Microsoft listened to their customers. SQL Server 2016 now has this feature available. You can try it out for yourself to see how it works.
But let me explain how it worked in SQL Server 2012/2014 and how it works in the upcoming version of SQL Server.
What You Need
- 1) Availability Group listener name. I call this a virtual server name because it encapsulates application connectivity to the database. You can move your databases to different SQL Server instances with different hostnames without changing your application connection string. In order for read-only routing to work, the Availability Group must have a listener name. Some customers do not use a listener name because of their specific network configuration. But in order to take advantage of read-only routing, you need a listener name.
- 2) Readable Secondary Replicas. By default, Availability Group secondary replicas are in constant recovery mode. This is because the REDO process is constantly replaying transaction log records. In order to allow read-only queries to Availability Group secondary databases, you need to configure them with read-only access.
- 3) Read-only routing URL. The read-only routing URL is an endpoint that a read-only client application connects to. The key to this is having the Application Intent=ReadOnly property in the connection string. It has to be configured on all of the replicas that you want to assign as a readable secondary, as per the last item in this list. The read-only routing URL only takes effect when the current replica is running as a secondary replica. You can think of it as an ON/OFF switch – it is turned OFF when the current replica is acting as primary replica.
- 4) Read-only routing list. As the name implies, this is a list of readable secondary replicas where you want your read-only client applications to connects to, again, as per the last item in this list. Unlike the read-only routing URL, the read-only routing list only takes effect when the current replica is acting as a primary replica – a complete opposite of the read-only routing URL.
- 5) Licenses. I cannot overemphasize this. The moment you run any workload on a standby SQL Server instance, it will require a license. Same is true for an Availability Group readable secondary replica.
How It Works in SQL Server 2012/2014
In SQL Server 2012/2014, when a client application connects to the Availability Group via the listener name, it gets evaluated for the Application Intent=ReadOnly property in the connection string. If the connection string attribute exists, it checks the read-only routing list. As per the SQL Server Books Online documentation,
Read-intent connection requests are routed to the first available readable secondary on the read-only routing list of the current primary replica.
This means that the algorithm will read the routing list, and stops at the very first one that responds. For example, say you have SERVER1, SERVER2 and SERVER3 in the read-only routing list. If SERVER1 immediately responds to the request, then all the subsequent read-only client applications will only be redirected to SERVER1. Depending on the read-only workload, this may introduce a ton of activities on SERVER1 while SERVER2 and SERVER3 are sitting there idle, doing nothing but receiving transaction log records from the primary replica and running REDO. The only time that SERVER2 would be used for read-only workloads is when SERVER1 becomes unavailable.
How It Works Starting With SQL Server 2016
The behavior is similar in SQL Server 2016. The big difference – availability of load balancing across all readable secondary replicas. As per the SQL Server Books Online documentation,
Read-intent connection requests are routed to the first available entry on the read-only routing list of the current primary replica. However, load-balancing across read-only replicas is supported.
This means that instead of just stopping at the first readable secondary replica in the read-only routing list, it will go thru all of the items in the list and redirect read-only client applications on all of them. Woohoo!
The Magic of the Parenthesis
In mathematical expressions, parentheses are used to denote modifications to normal order of operations (precedence rules.) By default, when you configure an Availability Group read-only routing list using the same way as you did before, the behavior will be like that of SQL Server 2012/2014. For example, using the T-SQL code below to define a read-only routing list to an existing Availability Group with three replicas, all read-only client applications will be routed to SERVER2 but not on SERVER3 unless SERVER2 becomes unavailable.
ALTER AVAILABILITY GROUP SQL2016AG
MODIFY REPLICA ON
This is the exact same syntax that you would use with SQL Server 2012/2014. What will make it balance the read-only workload using a round-robin algorithm is the use of an additional parenthesis inside the read-only routing list. For example, using the T-SQL code below to define a read-only routing list to an existing Availability Group with three replicas, all read-only client applications will be routed to SERVER2 first, then, to SERVER3. I’ve highlighted the additional parenthesis for emphasis
ALTER AVAILABILITY GROUP SQL2016AG
MODIFY REPLICA ON
SQL Server Books Online calls this the nested parenthesis. By using the nested parenthesis, you can configure load balancing of read-only workloads across all or just a subset of your readable secondary replicas. At this time, only one level of nested parenthesis is supported for load balancing of read-only workloads.
Only New Connections Get Re-routed
When an Availability Group failover happens, the Windows Server failover Cluster, takes the resource group offline. This includes taking the Availability Group listener name offline. Depending on how your client applications are written, a connection to a readable secondary replica can be retained during a failover. Of course, this assumes that the current readable secondary replica that the client application is connected to does not become the new primary replica. If it does, the client application will get disconnected and re-routed to the routing list defined on the current primary. This is to make sure that interruptions to read-only workloads are avoided whenever necessary.
For example, say SERVER1 is the current primary replica while SERVER2 and SERVER3 are configured as readable secondary replicas. If the Availability group is moved from SERVER1 to SERVER2, read-only workloads on SERVER2 will be redirected to SERVER1. However, read-only workloads on SERVER3 will remain since it is still acting as a secondary replica. New connections to the Availability Group will be evaluated based on the current primary replica; read-only workloads will be routed accordingly.
Test It Out
I have been working with SQL Server 2016 for a while now and tested out these features. I have been using my good-old Windows client application (it requires .NET Framework 4.0) from my very first Always On Availability Group presentation for testing this behavior. The one thing that I noticed was that sometimes the client application does not automatically get redirected to the other read-only replicas for load balancing. For example, if I move the Availability Group from SERVER3 to SERVER1 as primary, I expect the read-only client applications to be automatically redirected to SERVER2 and SERVER3. Sometimes, they are all pegged to SERVER2. I think it’s because when SERVER1 becomes the new primary, my read-only routing list is defined with SERVER2 and SERVER3, respectively. Since SERVER2 is still serving read-only requests, nothing changes. But the change from primary to secondary role on SERVER3 causes it to get disconnected. When it reconnects back, it just follows the sequence of secondary replicas defined routing list – connects first to SERVER2. This creates the impression that read-only client connections are just getting routed to SERVER2 when, in fact, it works as expected.
I’ve provided this feedback to the SQL Server product team responsible for Availability Groups. They have been very helpful in answering some of my questions. All I can say is that they have a really great team that understands what their customers’ needs are and prioritize on what matters.
I’ll write another blog post on the results of my testing the new database-level failure detection in SQL Server 2016 Availability Groups.