3 T.O.P. Reasons Why Your SQL Server Always On Availability Group Solution is Making Your Life Miserable – Learn SQL Server High Availability & Disaster Recovery

Have you ever felt like managing a SQL Server Always On Availability Group is making your life miserable? It’s severely messing up with your sleep cycle? It’s causing you to miss very special occasions? Even missing meals?

If so, you’re not alone. In fact, I see this going on all the time with the clients I serve. You see, I’ve had a very unique vantage point to observe the behavior and thinking of database administrators and systems engineers as they worked with me.

And because I was able to observe how they think, I’ve been able to identify the 3 T.O.P. reasons why your SQL Server Always On Availability Group solution is making your life miserable.

Let me explain…

For the past few months, I’ve been on phone calls with database administrators and systems engineers listening to stories of how their SQL Server Always On Availability Group (AG) solution is making their life miserable…literally.

One guy works as a senior DBA for a global IT company. He was assigned to take care of their newly deployed, mission-critical SQL Server Always On Availability Group. It’s a 3-replica SQL Server 2016 Always On Availability Group running on a Windows Server 2012 R2 Failover Cluster deployed across three different data centers.

Ever since the Always On Availability Group was deployed, they have been experiencing outages after outages. Borrowing his own words, “the outages are taking over his personal life.” It’s all he has been working on, day in and day out. He spends a lot of time on conference calls with the client explaining what caused the outage. His manager is constantly asking why he is spending too much time on the issue. And it doesn’t stop there. He leaves the office way past his scheduled dinner time with family, still on the phone as he drives home and continues with the conference call. He doesn’t get to spend time with his family, doesn’t get enough sleep, doesn’t get to eat well… The list goes on.

Another was a lady who is also a senior DBA working for a software-as-a-service (SaaS) company providing management information system for educational institutions. It’s her first time working with SQL Server 2017 Always On Availability Groups running on Amazon EC2 across different AWS regions, migrating their old database mirroring solution on SQL Server 2008 R2. She has already spent months trying to do things on her own, preventing her from accomplishing more meaningful tasks that contribute to the growth of the business. She’s overworked and overwhelmed – being the only senior DBA on her team – nervous and scared that she won’t meet the deadline to migrate all of the databases. I can hear it in her voice as I was on the phone with her.

And, then, there’s . . . you get the picture.

DOES THIS SOUND LIKE YOU?

Keep in mind, these are senior DBAs with years of technical experience. They are not recent college graduates who have no real-world DBA experiences. Nor are they consultants who only know the concepts but don’t have any battle scars to prove their experiences. Yet, working with Always On Availability Group is literally making their lives miserable. And it breaks my heart hearing their stories. Because we don’t intentionally want to skip dinners with family, endure sleepless nights paralyzed with fear that you won’t meet your deadlines nor cancel vacations because an outage has to be resolved quickly. They just happen.

That’s why you need to know the 3 T.O.P. reasons why your SQL Server Always On Availability Group solution is making your life miserable. You want to avoid making these mistakes so they don’t make your life miserable.

1. Team Capabilities (skillset). Some consultants or engineers build complex Always On Availability Group solutions to prove that they are smart. Some do it so they can add another entry in their resume. What’s worse is when a different person or team takes care of operations after the solution is built. I’ve been in customer environments where an external consultant built the solution yet the team managing it has no technical capabilities to manage it.

One of my customers reached out to me initially to help them build a SQL Server 2017 Always On Availability Group solution on Linux. They are a company who specializes in developing technologies for secure, trusted identities. Majority of their infrastructure technologies run on Linux while their developers are running one of the popular Linux distributions. Active Directory and their mission-critical database are the only two IT assets running on Windows Server. Seems like a perfect opportunity to run SQL Server 2017 Always On Availability Groups on Linux, right?

During our initial discovery call, I asked who else in the IT organization knows how to work with Linux Pacemaker – enough to troubleshoot an outage in case something goes wrong. To my surprise, no one does. Despite the fact that they run a huge Linux environment, no one knows a thing about Linux Pacemaker.

That was enough information for me to tell them NOT to deploy SQL Server 2017 Always On Availability Group on Linux.

Imagine what would happen if they got called in to fix an outage. Let’s say they were on vacation or a holiday. Even if it was just an ordinary day. Because they do not have the deep technical expertise to manage Linux Pacemaker, a troubleshooting call can last for hours. Got that call at 3AM? You’re not going back to sleep until the databases are back online.

Don’t build a solution without the right people and technical capabilities to manage it operationally.
2. Operational Processes. Whether you have a small or a large team, a formal operations process can help improve operational efficiency. Do you have a proper change management process in place to track changes to the SQL Server Always On Availability Group environment? Do you have documentation and runbook that the other members of the team can refer to during an outage? Who will receive the email alert from your monitoring tool? Do they know how to handle the issue when they receive the email alert (going back to #1)? How does your escalation procedure look like?

I was in the same boat several years ago as a data center engineer. We had a different team managing operations. Technically, I wasn’t the oncall engineer. But because we didn’t properly define the escalation procedures, the operations team will simply create a ticket once they receive an email alert from our monitoring tool and immediately escalate it to me – without even looking at what the alert is all about. I get a phone call almost every time there was an issue. And it didn’t matter whether I was having lunch, in the toilet, on the bus on my way home, or already in bed sleeping, I would still get the phone call.

This is the very reason I emphasize the creation of a formal operational process, be it for a small or a large team. The last thing I want is to get interrupted during a special holiday dinner because there were no proper escalation procedures.
3. Planned Spending (budget). It’s interesting to see how many environments do not have a monitoring tool specifically for their SQL Server Always On Availability Group environment. They already spent thousands of dollars on hardware, software and consulting services. Yet they do not have a monitoring tool specifically for SQL Server Always On Availability Group. What ends up happening is that responding to incidents become reactive instead of being proactive by preventing them from happening in the first place.

One of my clients could have prevented an outage if they got notified early on that the file share witness for their Windows Server Failover Cluster was inaccessible for weeks prior to the incident. They were running a 2-node Windows Server 2012 R2 Failover Cluster with a 2-replica SQL Server Always On Availability Group. Because the file share witness was inaccessible, the failover cluster only had 2 out of 3 votes. When one of their engineers rebooted the secondary replica during a planned maintenance window, the entire failover cluster went offline, taking the SQL Server Always On Availability Group with it. That’s because the failover cluster no longer had quorum. A monitoring tool could have prevented the outage.

You know what else isn’t typically included in the project budget? TRAINING. A complex SQL Server Always On Availability Group environment is designed and built, expecting the team to just know what to do (again, going back to #1). And we’re talking about mission-critical, revenue generating databases.

I had to go thru a major surgery after breaking my right leg last year due to an accident. Nobody in their right mind would allow an intern nor a surgeon who has not done anything like this before insert a titanium rod in their leg. I’m sure I wouldn’t. I’m glad that my surgeon had done this operation multiple times for patients who had similar injuries.

Yet we allow those without proper training and experience to design, implement and manage SQL Server Always On Availability Group that run mission-critical, revenue generating databases.
If you’re struggling with managing a SQL Server Always On Availability Group environment and you feel like it is already making your life miserable, head over to this site to see my calendar and schedule a FREE discovery call. Grab whatever appointment time works for you. We will get on the phone for about 45-60 minutes.

If I believe I can help you, I’ll let you know. If not, I’ll let you know that, too (and yes, I really do say “no” to 30% of the people I talk with).

Schedule a Call

Additional Resources

Training: SQL Server Always On Availability Group: The Senior DBA’s Ultimate Field Guide

Blog

Post3 T.O.P. Reasons Why Your SQL Server Always On Availability Group Solution is Making Your Life Miserable

Schedule a Call

Additional Resources

Leave a Reply Cancel reply