Ever wonder what goes on under-the-hood when you deploy a SQL Server Always On Availability Groups (AGs) for high availability?
I’ve been writing about AGs since it was introduced in 2012. Back when there wasn’t a lot of documentation available about the feature, we relied on direct interaction with the product team and conversations with community experts who had access to the beta release.
I was lucky enough to be a part of the early adopters, partly because I specialized in high availability and disaster recovery. But also because a few members of the SQL Server product team and community experts were into music. We were talking guitars and amps before we even talked about SQL Server.
More than a decade later and thousands of documentation, articles, blog posts, video tutorials, etc. written about it, there’s still very few about the interaction between SQL Server and the platform that makes high availability possible.
The goal of this series of blog posts goes beyond deploying AGs. There’s more than enough of those out there. Rather, it is to provide insights as to what goes on under-the-hood inside both SQL Server and the Windows Server Failover Cluster (WSFC) as you’re building AGs.
Because the WSFC is what makes SQL Server highly available.
SQL Server runs on top of the WSFC. It lets the WSFC handle the failure detection and automatic failover for the AG.
Remove the WSFC and your AGs are simply a means to replicate data to a different server. Think read-scale AGs.
An understanding of how WSFCs work in the context of AGs will give you insights on how to troubleshoot issues that led to an outage.
Because if your goal is high availability, you need to make sure that your databases remain highly available.
And when those unfortunate times come that you experience unexpected outages, you’ll know how to find the root cause of the issue to make sure it never happens again.
High Level Overview: Let’s Start From the Top
Before you can create an AG, you need to enable the Always On Availability Groups feature on every SQL Server instance that will use this feature.
This is more than a simple checkbox. This is what tells SQL Server to start integrating with the WSFC to leverage it for high availability.
I know I’m oversimplifying here. But bear with me. Because there’s a lot that goes on in this simple checkbox that most DBAs and sysadmins realize.
Before we start diving deep into the WSFC, let’s have a look at this. I recorded a video of the AG creation process (assuming all prerequisites have been met, like Active Directory permissions to create a listener name). I automated the AG creation process using dbaTools.
#Enable Always On feature
Enable-DbaAgHadr -SqlInstance TDPRD021 -Force
#Create SQL Server login
New-DbaLogin -SqlInstance TDPRD021 -Login TESTDOMAIN\sqlservice
#Configure Hadr_endpoints and permissions
New-DbaEndpoint -SqlInstance TDPRD021 -Name hadr_endpoint -Port 5022 | Start-DbaEndpoint
Invoke-DbaQuery -SqlInstance TDPRD021 -Query "GRANT CONNECT ON ENDPOINT::hadr_endpoint TO [TESTDOMAIN\sqlservice]"
#Enable Always On feature
Enable-DbaAgHadr -SqlInstance TDPRD022 -Force
#Create SQL Server login
New-DbaLogin -SqlInstance TDPRD022 -Login TESTDOMAIN\sqlservice
#Configure Hadr_endpoints and permissions
New-DbaEndpoint -SqlInstance TDPRD022 -Name hadr_endpoint -Port 5022 | Start-DbaEndpoint
Invoke-DbaQuery -SqlInstance TDPRD022 -Query "GRANT CONNECT ON ENDPOINT::hadr_endpoint TO [TESTDOMAIN\sqlservice]"
#Backup databases in preparation for adding to the AG
Backup-DbaDatabase -SqlInstance TDPRD021 -Database TST-DB, Northwind, SalesDB
#Create Availability Group
New-DbaAvailabilityGroup -Primary TDPRD021 -Secondary TDPRD022 -Name AG01 -Database TST-DB, Northwind, SalesDB -EndpointUrl 'TCP://TDPRD021.TESTDOMAIN.COM:5022', 'TCP://TDPRD022.TESTDOMAIN.COM:5022' -SeedingMode Automatic -FailoverMode Automatic -AvailabilityMode SynchronousCommit -Confirm:$false
Add-DbaAgListener -AvailabilityGroup AG01 -SqlInstance TDPRD021 -IPAddress 172.16.0.27 -Name TDPRDSQLAGLN27
#Enable AlwaysOn_health Extended Events session
Get-DbaXESession -SqlInstance TDPRD021 -Session AlwaysOn_health | ForEach-Object -Process { $_.AutoStart = $true ; $_.Alter() ; $_ | Start-DbaXESession }
Get-DbaXESession -SqlInstance TDPRD022 -Session AlwaysOn_health | ForEach-Object -Process { $_.AutoStart = $true ; $_.Alter() ; $_ | Start-DbaXESession }
#Set Database-level health detection for the Availability Group
Set-DbaAvailabilityGroup -SqlInstance TDPRD021 -AvailabilityGroup AG01 -DatabaseHealthTrigger
But instead of looking at it from the point-of-view of SQL Server, let’s look at it from the point-of-view of the WSFC.
I slowed the video down so you can observe the different activities that’s happening during the AG creation process. Pay attention to what’s happening.
Can you see what went on in the AG creation process?
Post the sequence of events in the Comments section based on your observation.
I did say this is an introduction, right?

