Avoid These High Availability Mistakes That Can Cost a SQL Server DBAs Job – Learn SQL Server High Availability & Disaster Recovery

I’ll never forget a conversation I had with a customer several years ago. I was asked to join a conference call to discuss a new SharePoint 2013 farm deployment. My involvement was limited to designing and deploying a SQL Server 2012 Availability Group for all of their environments – a different vendor was engaged to deal with the SharePoint implementation.

If you’ve been invited to join a conference call as a technical expert, you know what I mean when I say “it’s one of those calls.” My ears hurt as we’ve gone past the 2-hour mark, thinking to myself, “how long are we still going to be in this call?“

It wasn’t fun to be in that call. But I was very thankful to be involved in that project.

Because I’ve learned a lot from that conference call – lessons that can help SQL Server DBAs avoid these common high availability mistakes that can cost them their jobs.

1. Relying on outdated experience and information. If you’ve deployed a SQL Server failover clustered instance in the past, you are probably aware of the Hardware Compatibility List (HCL) found in the Windows Server Catalog website. That’s because you need to have the exact same hardware with the exact same software/firmware/updates/patches/etc. on all of the nodes in the WSFC in order for it to be considered supported by Microsoft. Oh, and you need to buy a very expensive SAN storage to run SQL Server on a WSFC. Well, that’s exactly what that specific customer did. Based on what their resident SQL Server specialist told them, they bought a new Dell Compellent SC8000 storage specifically for the SharePoint databases. For an hour-and-a-half on the phone, I tried to explain that SQL Server Availability Groups do not require shared storage. Unfortunately, the storage has been purchased two months before we did that call. Outdated information becomes even more risky when the solution is already in place and the availability of a mission-critical database is at stake.

2. Working in isolation. As we went along on the project, I’ve provided a step-by-step documentation on how to install and configure a failover cluster for their SQL Server Always On Availability Group. Being a large organization, the teams were siloed and assigned specific tasks; there’s the network team, the systems team, the database team, the application team, and so on. One of their systems engineers responsible for building the WSFC could not get past the Create Cluster Wizard. So he gave me a call. Realizing what the problem was, I told him to get the Active Directory (AD) team involved to get the issue resolved. It turned out that his AD account did not have permissions to create the cluster name object (CNO) in AD (I cover this concept in this blog post – and it still is the most popular blog post to date.) But what’s really interesting is that he already had the idea that the issue was related to AD. He just didn’t bother getting the AD team involved before reaching out to me. Keep in mind, as a consultant, I’m still an outsider.

3. Not thoroughly testing the solution. Need I say more? This is so true especially with tight deadlines. When availability of mission-critical databases is at stake, ensure that availability goals are met with the minimum amount of required tests performed. Power failure? Checked. Network failure? Checked. Server blue screening? Checked. Prepare a checklist of things that you need to test to validate whether or not your solution meets the availability goals. The reason I say this is because after going live with the project, our operations team got blamed when their primary Availability Group replica was taken offline when nothing was wrong with it. I referred them to point #1 in this list. They didn’t like it when I said, “it’s by design.” I’ll save the story for another blog post.

4. Becoming too focused on the technology. As technology professionals, we get too attached to our work that we fail to step back and look at the bigger picture. We forget that solutions are useless if there are no business problems to solve. While I’m a big fan of WSFC for SQL Server, I don’t instantly recommend it as a solution without understanding what the real requirement is. There’s a reason why I always start any high availability and disaster recovery discussion with The Alphabet Soup for HA/DR. As the customer told me their intent to deploy SQL Server Always On Availability Groups on all of their environments, I suggested a different architecture for their production environment – a combination of SQL Server failover clustered instance (FCI) for their production environment with an asynchronous Availability Group for their DR. When they asked me why I preferred that design over the original one, mentioning the cost difference between SQL Server licenses and the SAN storage was more than enough to convince them. Plus, they got to use one of their Dell Compellent SC8000 storage.

If you need help with your SQL Server Always On deployments, reach out and schedule a call with me using my online calendar.

Schedule a Call

2 comments on “Avoid These High Availability Mistakes That Can Cost a SQL Server DBAs Job”

Rowdy Vinson says:

at

Hi Edwin, Great post and advice. I recently rolled out an HADR SQL solution at my shop and it’s working like a champ because we followed a very similar set of rules. Glad to see them so well written and presented. One thing I think you may have overlooked, or just didn’t dig in to is the depth of expertise required to support some of the more advanced solutions. An AG in 2 datacenters will have a significantly different skills requirement than a simple DB mirroring setup. It is important to factor in the supportability of the solution in the given environment.

1. Edwin M Sarmiento says:
  
  at
  
  Thank you for reading, sir.
  
  Agreed, the skills and the depth of expertise required to support some of the more advanced solutions are different from the simple ones. And that spells the difference between successful and failed implementation and operational support.
  
  When I started teaching and delivering presentations on Availability Groups back in 2011, I made claim that I could get in trouble with Microsoft marketing for saying that Availability Groups is really nothing new. That’s because the technology is based on Windows Server Failover Clustering and Database Mirroring – technologies that existed even on SQL Server 2008. But instead of learning just one of them, you now need to learn both.
  
  This is the reason why I decided to create this online course – to enable SQL Server DBAs to be more confident in building SQL Server HA/DR solutions that rely on Windows Server Failover Clustering, be it failover clustered instances or Availability Groups. I also leveraged my experience as a former data center engineer to highlight the need for acquiring the different skills needed to support a complex HA/DR solution

Blog

PostAvoid These High Availability Mistakes That Can Cost a SQL Server DBAs Job

Schedule a Call

2 comments on “Avoid These High Availability Mistakes That Can Cost a SQL Server DBAs Job”

Leave a Reply Cancel reply

Schedule a Call

Subscribe to my mailing list.

2 comments on “Avoid These High Availability Mistakes That Can Cost a SQL Server DBAs Job”

Leave a Reply Cancel reply