One of the subjects I had in senior high school was technology and home economics. Since my teacher majored in architectural drafting, she taught us how to read blueprints to have an idea about how structures are built. That’s when I noticed a crack on our wall near the TV. In the Philippines, walls are mostly made of reinforced concrete. I never paid attention to this up to this point. However, the crack was becoming more obvious. I could almost see the outside of the house through that crack on the 10-inch think wall. That was when my mom hired contractors to fix the crack, patching it up with cement until it was all covered up. Temporarily, that is.
Because less than a year later, the crack decided to come out of hiding. The patched concrete slowly broke down until I could see the outside of the house again. Luckily, we had concrete fences.
For the past decade, I’ve had the opportunity to interact with DBAs, sysadmins, and IT professionals who are responsible for Always On Availability Groups. And whether they are designing a new implementation or making changes to an existing one, the questions they ask mostly revolve around all the different knobs and switches that they have to deal with, things like:
- – How do we configure the heartbeat settings for cluster nodes in different AWS regions?
- – Where do we place the file share witness?
- – What’s the best approach for applying Windows patches in an AG environment?
- – How do you find the lag between the primary and the secondary replica in an Availability Group?
They’re all valid questions. And they’re all worth asking. But what I find interesting is that having the answers to these questions don’t necessarily provide the solution to the real problem.
- – The person who asked about configuring cluster heartbeat settings for cluster nodes in different AWS regions now has to deal with uncontrollable log file growth
- – The DBA who asked about where to place the file share now has to think of how to convince his boss why they need a third data center
- – The sysadmin who asked about the best approach for applying Windows patches in an AG environment suddenly caused a massive outage
- – The guy who asked about the lag between the primary and secondary replicas is now struggling to figure out why the AG did not failover automatically
The more they get answers to their questions, the more confused they become. And the Microsoft documentation isn’t helping either. I’m sure you got more confused reading how to configure cluster heartbeat settings for a Windows Server Failover Cluster. So, it’s not enough to have your questions answered. Because they’ll raise even more questions that you probably don’t have skills to handle. Knowing why you get confused can help you make better decisions to build and manage Always On Availability Group solutions.
- 1) They start with complex configurations, not the simple ones. An Always On Availability Group running on top of a Windows Server Failover Cluster is a very complex environment with a ton of moving parts that you have no control over as a SQL Server DBA. Your job as a DBA now depends on things that you’re not aware of – Active Directory, DNS, networking, failover clustering. And that doesn’t even consider all the other possible combinations like HA with DR, on-premises with cloud, physical machines with VMs, etc. Starting with complex configurations without having a good grasp of the simple ones is like me as a high school student creating a blueprint for a suspension bridge. The more complex a solution is, the less confident you will be designing or even managing it. Simple configurations, on the other hand, allow you to understand things easily. They allow you to see how one piece interacts with another, how one action can lead to another. It also helps you build confidence. A high school student can easily work on calculus problems better than calculating the allowed weight on the reinforced concrete for the suspension bridge. He can still design a suspension bridge once he decides to pursue a career in structural engineering.
- 2) They focus on the HOW, not the WHY. Most of the questions being asked fall under the HOW category – “How do I do this?” or “How do I achieve this?” Most technical professionals fall into this trap because we love solving problems. We were trained to solve problems. And I’m no different. I love getting my hands dirty and seeing how what I do will make a huge impact. But we barely ask the WHY questions. “Why do we need to stretch the AGs across different data centers?” “Why do we need to minimize the lag between the primary and secondary replicas?” “Why do we need AGs in the first place?” Asking the WHY questions help us filter the right HOW questions that we need to ask. I mean, what’s the point of asking how to stretch the AGs across two different data centers if we don’t even know why we need AGs in the first place?
- 3) They skip the basics. The biggest mistake I see tech professionals make is they get fixated on features and technologies. That’s why they get easily overwhelmed when a new feature is introduced. They look at it like it’s a brand new thing that they need to learn. I blame the Internet for that. You can search for anything and everything online now that nobody bothers to understand how things work. Somebody teaches a hack, they try it, and it works. Problem solved. Until the problem happens again. You go back to your favourite search engine looking for how to solve the same problem. And if the keywords match the first few results, that’s what you go for. Imagine what it would be like if a suspension bridge was designed by a senior high school student who created a blueprint based on the answers he got from a collection of question he posted on StackExchange. Or a medical student who is about to perform surgery based on watching a few episodes of Grey’s Anatomy (or maybe ER). Scary, isn’t it? Yet, thousands of DBAs are given responsibilities to mission-critical databases without understanding the basics and fundamentals of what makes Availability Groups and Windows Server Failover Clustering work. The more complex a solution is, the more you need a good grasp of the basics and the fundamentals.
Let’s face it. If part of your responsibilities as a SQL Server DBA is deploying and managing Always On Availability Groups, you need to build the confidence by acquiring the skills to get the job done. You need to learn how to ask the WHY questions so you don’t get overwhelmed with all the confusing and complex configuration settings that may not be necessary for your specific implementation. And never skip the basics. I cannot overemphasize that.