PostAvoid These High Availability Mistakes That Can Cost a SQL Server DBAs Job

I’ll never forget a conversation I had with a customer several years ago. I was asked to join a conference call to discuss a new SharePoint 2013 farm deployment. My involvement was limited to designing and deploying a SQL Server 2012 Availability Group for all of their environments – a different vendor was engaged to deal with the SharePoint implementation.

 

If you’ve been invited to join a conference call as a technical expert, you know what I mean when I say “it’s one of those calls.” My ears hurt as we’ve gone past the 2-hour mark, thinking to myself, “how long are we still going to be in this call?

 

It wasn’t fun to be in that call. But I was very thankful to be involved in that project.

 

Because I’ve learned a lot from that conference call – lessons that can help SQL Server DBAs avoid these common high availability mistakes that can cost them their jobs.

 

  1. 1. Relying on outdated experience and information. If you’ve deployed a SQL Server failover clustered instance in the past, you are probably aware of the Hardware Compatibility List (HCL) found in the Windows Server Catalog website. That’s because you need to have the exact same hardware with the exact same software/firmware/updates/patches/etc. on all of the nodes in the WSFC in order for it to be considered supported by Microsoft. Oh, and you need to buy a very expensive SAN storage to run SQL Server on a WSFC. Well, that’s exactly what that specific customer did. Based on what their resident SQL Server specialist told them, they bought a new Dell Compellent SC8000 storage specifically for the SharePoint databases. For an hour-and-a-half on the phone, I tried to explain that SQL Server Availability Groups do not require shared storage. Unfortunately, the storage has been purchased two months before we did that call. Outdated information becomes even more risky when the solution is already in place and the availability of a mission-critical database is at stake.
  2.  

  3. 2. Working in isolation. As we went along on the project, I’ve provided a step-by-step documentation on how to install and configure a failover cluster for their SQL Server Always On Availability Group. Being a large organization, the teams were siloed and assigned specific tasks; there’s the network team, the systems team, the database team, the application team, and so on. One of their systems engineers responsible for building the WSFC could not get past the Create Cluster Wizard. So he gave me a call. Realizing what the problem was, I told him to get the Active Directory (AD) team involved to get the issue resolved. It turned out that his AD account did not have permissions to create the cluster name object (CNO) in AD (I cover this concept in this blog post – and it still is the most popular blog post to date.) But what’s really interesting is that he already had the idea that the issue was related to AD. He just didn’t bother getting the AD team involved before reaching out to me. Keep in mind, as a consultant, I’m still an outsider.
  4.  

  5. 3. Not thoroughly testing the solution. Need I say more? This is so true especially with tight deadlines. When availability of mission-critical databases is at stake, ensure that availability goals are met with the minimum amount of required tests performed. Power failure? Checked. Network failure? Checked. Server blue screening? Checked. Prepare a checklist of things that you need to test to validate whether or not your solution meets the availability goals. The reason I say this is because after going live with the project, our operations team got blamed when their primary Availability Group replica was taken offline when nothing was wrong with it. I referred them to point #1 in this list. They didn’t like it when I said, “it’s by design.” I’ll save the story for another blog post.
  6.  

  7. 4. Becoming too focused on the technology. As technology professionals, we get too attached to our work that we fail to step back and look at the bigger picture. We forget that solutions are useless if there are no business problems to solve. While I’m a big fan of WSFC for SQL Server, I don’t instantly recommend it as a solution without understanding what the real requirement is. There’s a reason why I always start any high availability and disaster recovery discussion with The Alphabet Soup for HA/DR. As the customer told me their intent to deploy SQL Server Always On Availability Groups on all of their environments, I suggested a different architecture for their production environment – a combination of SQL Server failover clustered instance (FCI) for their production environment with an asynchronous Availability Group for their DR. When they asked me why I preferred that design over the original one, mentioning the cost difference between SQL Server licenses and the SAN storage was more than enough to convince them. Plus, they got to use one of their Dell Compellent SC8000 storage.

 

If you need help with your SQL Server Always On deployments, reach out and schedule a call with me using my online calendar.

Schedule a Call

 


Subscribe to my mailing list.

* indicates required



By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close