Azure SQL Offers Manual Failover for PaaS Resources

Sometime having the right command in place opens up new doors to test things, like a failover for example.  In this post we will take a look at a new ability that has recently surface within the Azure eco-system to help manage fail-overs.  Let’s jump to it.

High availability is a crucial component for data professionals, even when operating in a cloud environment such as Azure.  Thankfully, Microsoft Azure Platform as a Service (PaaS) is architected in a way that offers high availability for service right out of the gate.  This helps to ensure that your databases, such as Azure SQL Database and Azure SQL Managed Instances, are always available without having to lift a finger.  What even better is Microsoft now offers the ability to manually control a failover over for these resources which gives data professionals more granular control.

Previously, the service would manage this aspect and Microsoft would initiate the failover if needed.  But what if I wanted to test the failover to see how my applications would react?  Would a failover impact my end users?  There was not any way to test this even though the service offers a high level of availability. Thankfully that has changed and we can now control, to a degree, failovers for Azure SQL Platform as a Service resources, including Azure SQL Database, Elastic Pools, and SQL Managed Instances.

How can we manage a high availability failover in Azure SQL PaaS?

To facilitate the failovers, you must do this through some type of command line interface. This means either PowerShell, Azure CLI, or a REST API call.  There is currently not a way to manage this through the portal.  In the future we could possibly see such capability, but I do not know if or when that would come to fruition.  For the purposes of this post, we will look at PowerShell.

There are three powershell cmdlets that will failover Azure SQL resources.

Invoke-AzSQLDatabaseFailover

This cmdlet will failover an individual database.  If the database is involved within an elastic pool, the failover will not affect the entire pool and will only affect the database itself.  In testing, failing over a database involved with an elastic pool did not affect the databases membership in the pool.   Furthermore, if the database is within an Availability Zone, the database will be failed over to a secondary zone and all client connections will be redirected to the new primary zone.

It is also worth noting that there is a “-ReadableSecondary” switch that would instead a failover the readable secondary.  Since you could be using a readable secondary to off-load read workloads it would make sense to test how its failover would impact those workloads.

Invoke-AzSQLElasticPool

This cmdlet will failover an entire elastic pool which means all the databases within the pool will failover.  This cmdlet will be handy if you are utilizing elastic pools to help minimize Azure costs but still want to test a failover.

Invoke-AzuSQLInstanceFailover

Like it’s the two predecessors, this cmdlet will failover a SQL Managed Instance.  It also has a readable secondary switch that you can utilize to failover the readable secondary.

Are there any limitations?

With great power comes great responsibility and such is the case here.  Given the intrusive nature of the failover within the Azure eco-system, it stands to make sense that you can only failover the resources every so often.  Currently, at the time of this post, the documentation states you can only failover every 30 minutes.  However, during testing things, I got a different error message that states it’s every 15 minutes.

Image of error message stating 15 minute delay between failovers
Click on the image to enlarge

I have given feedback to Microsoft regarding this discrepancy and they were able to get it resolved and the documentation will be updated to reflect a 15 minute duration between failover events.

What else would this help fix?

Even with the highly durable infrastructure that Microsoft has built, there are occasions where hardware issues arise where the service might not failover.  While failing over to a DR solution (such as active geo-replication or automatic failover groups) would help to resolve it, if things are configured correctly that is more intrusive to the application.  By having the ability to failover, customers can now initiate a failover when hardware issues surface without having to implement their disaster recovery solutions.

Summary

Microsoft continues to enhance and improve the Azure SQL eco-system.  By having the ability to control and test failovers for Azure SQL resources just further provides a deeper level of control for data professionals.  If you are utilizing Microsoft Azure or even planning on moving Azure, I highly recommend you get familiar with how this feature works so that you can verify with certainty how your applications will handle a database high availability failover.

© 2020, John Morehouse. All rights reserved.

The post Azure SQL Offers Manual Failover for PaaS Resources first appeared on John Morehouse.

Contact the Author | Contact DCAC

Azure SQL Offers Manual Failover for PaaS Resources

Sometime having the right command in place opens up new doors to test things, like a failover for example.  In this post we will take a look at a new ability that has recently surface within the Azure eco-system to help manage fail-overs.  Let’s jump to it.

High availability is a crucial component for data professionals, even when operating in a cloud environment such as Azure.  Thankfully, Microsoft Azure Platform as a Service (PaaS) is architected in a way that offers high availability for service right out of the gate.  This helps to ensure that your databases, such as Azure SQL Database and Azure SQL Managed Instances, are always available without having to lift a finger.  What even better is Microsoft now offers the ability to manually control a failover over for these resources which gives data professionals more granular control.

Previously, the service would manage this aspect and Microsoft would initiate the failover if needed.  But what if I wanted to test the failover to see how my applications would react?  Would a failover impact my end users?  There was not any way to test this even though the service offers a high level of availability. Thankfully that has changed and we can now control, to a degree, failovers for Azure SQL Platform as a Service resources, including Azure SQL Database, Elastic Pools, and SQL Managed Instances.

How can we manage a high availability failover in Azure SQL PaaS?

To facilitate the failovers, you must do this through some type of command line interface. This means either PowerShell, Azure CLI, or a REST API call.  There is currently not a way to manage this through the portal.  In the future we could possibly see such capability, but I do not know if or when that would come to fruition.  For the purposes of this post, we will look at PowerShell.

There are three powershell cmdlets that will failover Azure SQL resources.

Invoke-AzSQLDatabaseFailover

This cmdlet will failover an individual database.  If the database is involved within an elastic pool, the failover will not affect the entire pool and will only affect the database itself.  In testing, failing over a database involved with an elastic pool did not affect the databases membership in the pool.   Furthermore, if the database is within an Availability Zone, the database will be failed over to a secondary zone and all client connections will be redirected to the new primary zone.

It is also worth noting that there is a “-ReadableSecondary” switch that would instead a failover the readable secondary.  Since you could be using a readable secondary to off-load read workloads it would make sense to test how its failover would impact those workloads.

Invoke-AzSQLElasticPool

This cmdlet will failover an entire elastic pool which means all the databases within the pool will failover.  This cmdlet will be handy if you are utilizing elastic pools to help minimize Azure costs but still want to test a failover.

Invoke-AzuSQLInstanceFailover

Like it’s the two predecessors, this cmdlet will failover a SQL Managed Instance.  It also has a readable secondary switch that you can utilize to failover the readable secondary.

Are there any limitations?

With great power comes great responsibility and such is the case here.  Given the intrusive nature of the failover within the Azure eco-system, it stands to make sense that you can only failover the resources every so often.  Currently, at the time of this post, the documentation states you can only failover every 30 minutes.  However, during testing things, I got a different error message that states it’s every 15 minutes.

Image of error message stating 15 minute delay between failovers
Click on the image to enlarge

I have given feedback to Microsoft regarding this discrepancy and they were able to get it resolved and the documentation will be updated to reflect a 15 minute duration between failover events.

What else would this help fix?

Even with the highly durable infrastructure that Microsoft has built, there are occasions where hardware issues arise where the service might not failover.  While failing over to a DR solution (such as active geo-replication or automatic failover groups) would help to resolve it, if things are configured correctly that is more intrusive to the application.  By having the ability to failover, customers can now initiate a failover when hardware issues surface without having to implement their disaster recovery solutions.

Summary

Microsoft continues to enhance and improve the Azure SQL eco-system.  By having the ability to control and test failovers for Azure SQL resources just further provides a deeper level of control for data professionals.  If you are utilizing Microsoft Azure or even planning on moving Azure, I highly recommend you get familiar with how this feature works so that you can verify with certainty how your applications will handle a database high availability failover.

© 2020, John Morehouse. All rights reserved.

The post Azure SQL Offers Manual Failover for PaaS Resources first appeared on John Morehouse.

Contact the Author | Contact DCAC

How Do I Make MySQL Highly Available with Open Source Tools?

This is clickbait post title, sorry. You are here now. The correct answer is that you should purchase MySQL as a database service from your favorite cloud provider (Google, Amazon, and Azure all offer prebuilt database as a service offerings) because they have gone through the trouble of making their solution highly available. I’ll speak to Azure, because I’m familiar with the platform–Microsoft doesn’t employ MySQL High Availability per se, however both the storage and the VM are highly available. If there is a disk failure, the service is not impacted, and if there is a failure in the compute tier, a new VM is provisioned.

My second recommendation, if you really, really want to build your own thing is to build a Windows Server Failover Cluster, and used shared storage. Make the MySQL service a clustered resource, and assign a floating IP to the service that will fail with it. (Yes, I know you have to pay M$ for the Windows Server licenses).

Why shouldn’t you use an open source solution to make your MySQL database highly available? First let’s look at a picture of a common MySQL high availability architecture:

If we think about what we need a clustering solution to provide it comes down to a few things:

  • Providing a floating IP address to allow connection to the primary
  • Check the health of the database services and initiate a failover in the event one of them isn’t healthy
  • Executing a clean database failover and providing the ability to easily fail back
  • Ensuring the stability of the overall cluster, maintaining quorum, and avoiding split brain scenarios
  • If you are using a shared storage scenario, the clustering solution needs to manage the shared storage to coordinate failover with services.

If you are using SQL Server with Windows Server Failover Clustering, the cluster service takes care of all of the above, and more. When you look to do this on Linux for MySQL that there about 10 different sets of components you can use to make the service highly available. At the basis of all of these solutions is MySQL replication it’s pretty trivial transactional replication. MySQL’s replication service is fairly robust, and the GTID implementation is pretty solid.

The problem is that the rest of the components are all mix and match. You could use Haproxy to float the IP address, but there’s no way to do a smart health check of the database. It simply does a port connection test. Which means, if your primary goes away, and then comes back without some advanced configuration your floating IP is going to fail back to the original primary whether it’s actually the primary in your replication pair. This is but one example–you are going to end up with 3 or 4 different components to execute each of these functions, and congratulations you are in charge of a complex distributed system that you are responsible for administering for the rest of your life.

But Joey, Facebook/Google/Pick You Other Favorite online megacorp run MySQL and they support it with 5 9s. Ok, sure, I don’t disagree with this–and as databases, MySQL and PostgreSQL are generally ok. But look around at your engineering staff–wait do you have engineering staff? If you don’t have a few people who have both really good Linux SA skills and DBA skills, you are going to be pretty quickly in situation where support is a challenge.

Finally, consider if you need an HA solution. Are you running on a virtual machine? As long as your infrastructure is solid, that probably gets you to about 99.5% availability on a bad week. What you absolutely want to avoid is the Windows 2000 paradigm, which is where your high availability solution incurs more downtime than a standalone system.

Contact the Author | Contact DCAC

Storage Field Day 19 MinIO #SFD19

In January I had the chance to attend Storage Field Day 19 in Santa Clara, where we got to meet with a wide variety of startups and large enterprise storage companies. One of the more interesting companies we meet with was MinIO which has a really interesting and compelling object-based storage product.

IMG_3141
Some of MinIOs Customers and Use Cases

I’ve talked about object storage here before, but it’s a very different paradigm than the traditional block based storage you may currently be using. With block storage files are split into evenly sized blocks of data (typically somewhere between 64 KB and 1 MB depending on your vendor). Data protection is provided by traditional RAID options.

Object storage on the other hand doesn’t split files into blocks. Files are stored as objects which contain the file data, metadata, and a unique identifier. There is no limit on the size or amount of the metadata associated with the file. If you have ever created a managed disk in Azure, taken a backup to URL, or used an Azure SQL Database you’ve used object based storage. In object based storage, redundancy is generally provided by maintaining three copies of the object (e.g. a write isn’t considered complete until it writes to all three copies).

Object storage is designed to solve problems of scale. One of the things I learned at Comcast was that the cost of SAN storage didn’t scale to some of the massive petabyte scale data problems we had. The management overhead, the cost, and sometimes even the storage itself does not scale. This is a problem largely for companies like Microsoft, Amazon, Google, Facebook, etc, who have massive amounts of data to store.  But as data volumes grow there are lots of other firms who have very large volumes that they need to manage.

MinIO is a firm that offers such a solution. MinIO offers open source storage management software that offers extremely fast (183 GB/s reads and 171 GB/s writes). It is fully compatible with Amazon’s S3 API, which has somewhat become the de facto standard for object storage. They were working on Azure Blob Storage support when we visited.

One of the ways MinIO is able to get such good performance out of pretty standard hardware is by taking advantage of SIMD processor instructions, which all more text and number crunching to be performed per CPU instruction which dramatically increases performance. SQL Server uses this through the query processor’s use of batch mode.

MinIO’s storage can also be used as a persistent store for Kubernetes (drink), or used for systems like Spark, TensorFlow, and a replacement for Hadoop HDFS. Where you would probably use this in your environment would be to replace your file servers, or as a target for container storage, or maybe even an analytic store. Or you want to become a cloud storage provider and you need to host 50 PB of data in your data center.

Contact the Author | Contact DCAC
1 2 3 6

Video

Globally Recognized Expertise

As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.

And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.

Awards & Certifications

Microsoft Partner   Denny Cherry & Associates Consulting LLC BBB Business Review    Microsoft MVP    Microsoft Certified Master VMWare vExpert
INC 5000 Award for 2020    American Business Awards People's Choice    American Business Awards Gold Award    American Business Awards Silver Award    FT Americas’ Fastest Growing Companies 2020   
Best Full-Service Cloud Technology Consulting Company       Insights Sccess Award    Technology Headlines Award    Golden Bridge Gold Award    CIO Review Top 20 Azure Solutions Providers