I was at Microsoft Ignite last week, and I spent the better of part of 3 days working in the “SQL Server” booth. I put SQL Server in quotes, because in 2019 that term is stretched. Our booth covered SQL Server 2019, Arc (which you should read about because it’s really cool, but is the subject of another post), SQL on Edge, SQL Server Big Data Clusters, Azure SQL Database, and Azure SQL Database Managed Instance, and a few other things that I’m too tired to type. And that doesn’t even include Microsoft Spectre (which is the artist formerly known as Azure SQL Data Warehouse)
My big takeaway from working the booth, was how most of the customers I spoke to, didn’t understanding the differences between the various platform options. In fact, I think I took the following question over 100 times: “We’re currently on-premises, but moving to the cloud, can you explain the difference between Azure SQL Database, Managed Instance, and SQL Server in a VM?” The fact that we had to answer that so much, isn’t a failing on the part of the customers, as it is Microsoft and us MVPs not communicating the message effectively. So, in order to do my part, I’m going to write a post (I also wrote a Redmond column last month, but this post will go into some more detail). I’m not going to talk about pricing in detail in this post, as it’s effectively the same as the two services.
First Things First—Do you Know What PaaS Means?
Azure SQL Database and Managed Instance are both known as Platform as a Service (PaaS) offerings. This means you the DBA will manage a lot less than you typically would on either an on-premises SQL Server, or even a VM running SQL Server in Azure on the Infrastructure as a Service (IaaS) offering. If you run one of the PaaS offerings, Microsoft is responsible for the following tasks:
- Backup of your database and transaction logs
- High Availability within your Azure region, with a financial guarantee (the number of 9s depends on config, but it’s pretty solid)
- Installation, configuration, and patching of SQL Server software
- Installation, configuration, and patching of Windows server
- Consistency checks of your database
- Performance monitoring of your database (This is configured for you, you still have to react to the numbers)
What tasks does that leave you to do?
- Configuring disaster recovery across Azure regions—while this is just a few clicks, you get to convince your boss to pay for it, and it’s not turned on by default
- Performance tuning—getting utilization down by query optimization can save your company money. It’s a quantifiable part of your job now
- Network architecture—you need to be able to make sure your virtual networks in Azure can all talk to your database services, and in a secure fashion
- User Management and Security (though there are some security benefits in PaaS)
- Helping your business derive value from the data you are storing in the database
Ok, Tell Me About Cloud Offerings
I’m going to start out by talking about running SQL Server in an Azure VM. Why would you choose to do this, when the aforementioned PaaS services do all of this magic for you? The main reason I can think of is vendor support. Your business runs some high quality ISV software that requires you to remain on SQL Server 2012 CU1 (just kidding, you know it’s actually SQL 2005 RTM, and Windows 2003), and you can only run this mission critical conference room projection software on that version of SQL Server.
The other reasons why you might choose to run SQL in a VM besides vendor support (which is common) is that the size of your database doesn’t work well with the current limits in Azure PaaS (which is 8 TB in Managed Instance, and effectively unlimited in Azure SQL Database Hyperscale), or that you are willing to trade off some availability for reduced cost of running in a standalone VM.
What do I mean by costs? While PaaS offerings include high availability, that’s something Microsoft has to include in your pricing (otherwise, their shareholders, including me, would murder them), so compared to a VM that is standalone, the PaaS offering will commonly be more expensive. To get a better cost estimate, you should really multiply your IaaS costs by 3. (Also, VMs have pretty good availability—it can be enough, especially for departmental apps).
Azure SQL Database/Azure SQL Database Hyperscale
This is truly one of the OG Azure offerings, dating back to 2009/10. It is really designed to be a standalone database offering. Think of an app, that needs a single database, and wants minimal effort around spinning that up, and you have the right answer for Azure SQL Database. The cost for getting started is also really cheap at $5/month for a basic DB (it can be way, way more than that, the P15 database is $15k USD/month, and there’s a new M-series offering that will be even more). Effectively, an Azure SQL Database is a single user database running in a SQL Server instance somewhere in the cloud. There’s a feature called elastic pools that confuses this a little bit.
Elastic pools are a group of Azure SQL Databases that share the same resources. This is effectively a SQL Server instance somewhere in the cloud that hosts multiple user databases. The elastic pool feature is mostly designed for applications where each customer gets a copy of the same database, and workload is mixed at different times of the day, so resources are utilized most effectively.
Azure SQL Database has some tools called elastic query, elastic transactions, and elastic jobs, that allow you to perform actions across multiple databases (elastic is a bit of a misnomer here, there’s no direct connection to elastic pools), but in general cross-database queries using classical three part naming database.schema.object do not work in Azure SQL Database, so if you need to joins or complex queries across databases, this probably isn’t your cloud product.
Way back when Azure SQL Database was size limited to 5 GB, then ultimately 4 TB, until the hyperscale architecture was developed. Hyperscale is a completely model for physical database infrastructure design (read more about it here, it’s really cool) and should be able to scale to near petabyte scale. If you’ve got more than that much data in a single database, we really need to talk. The one challenge with SQL DB and database size, is that you can’t restore to it, you have to use a database export and load data using a BACPAC, so it can be very slow for very large databases. My guidance here is that if you have a very large on-premises database that you’d like to move, Managed Instance may be a better choice, as the migration path is easier. If you are building a new application in the cloud, Azure SQL Database is frequently the best choices.
There are a few other limitations of on-premises features that Azure SQL Database doesn’t support. Those are as follows:
- Common Language Runtime (CLR)
- Change Data Capture
- Cross-Database Transactions
- Database Mail
- Linked Servers* (sort of—you can connect to external data sources)
- Replication Source (you can publish to, but not from an Azure SQL DB)
- Resource Governor
- Service Broker
- SQL Server Agent
Of those features, the biggest one is SQL Server Agent. There are a number of different ways to schedule jobs to run in Azure, the most common being Azure Automation, which allows for PowerShell or Python. You can also have an Agent job on a SQL VM running full-blown SQL Server, though that seems less practical.
I mentioned new application development as a target for Azure SQL Database. Since, SQL DB allows for your database to be a replication target, it can also be a good use case for reporting, particularly for field users who are afar from the home office. You can leverage either geo-replication, or just have multiple replication targets, and allow field users to report against them.
From a network perspective, your SQL Database is on the internet, with a firewall in front of it (either at the server or the database level). You can also elect to only allow traffic from a Virtual Network within Azure, however your database is still on shared networking infrastructure (albeit secured).
Azure SQL Database Managed Instance
This service came much later, being announced at Build in 2017, and going GA last year. Managed Instance was specifically designed to make it easier to move from on-premises SQL Server to a PaaS services. One of the initial differences was that Managed Instances uses a virtual core-based pricing model (Azure SQL Database still offers a legacy DTU based model), so you pay a price per core, depending on your tier.
The two tiers of Managed Instances (and vCore based Azure SQL Database) are general purpose and business critical. The two key differences between these tiers are performance and high availability models—general purpose (GP) uses networked SSDs and will generally have IO latency in the 5-10 ms range. If you are using GP, you need to read this blog post on how to optimally configure your storage, as other it can be really painful. GP also uses the underlying storage model for its high availability. Business critical (BC) uses the local SSD that’s in each server host for storage, which is lower latency than a network call, and effectively uses an AlwaysOn Availability Group with four replicas for its availability model. You can also query the secondary replica on BC, by using the application intent=read only connection syntax in the SQL client.
In terms of migration, you can restore a backup (from Azure Blob Storage using the RESTORE FROM URL syntax) into Managed Instance. For a more real-time migration strategy, you can use the Data Migration Services tool which will allow for an almost zero-downtime migration. There’s been talk of allowing log shipping, or at least “log shipping” (via an API), but last I checked (and I’ll check again) you had to use DMS.
Features in MI are much closer to on-premises SQL Server. The big gap closers between MI and SQL DB are that the SQL Agent is there, CDC is there, and cross-database queries can be performed. The following list is the features that are included in Managed Instance and not in SQL DB:
- Copy-only backup
- Cross-Database Queries/Transactions
- Linked Servers (Only to MI and other SQL Servers)
- Resource Governor
- Service Broker (within instance)
The gaps to on-premises are as follows:
- R Services
- SQL Server Analysis/Integration* (there’s a way to do this with Azure Data Factory, but it costs money)/Reporting Services
There are a few other differences, but I went for the ones that I see as most commonly affecting customers. One thing to note for both MI and Azure SQL DB is that they rely on Azure Active Directory and not on-premises Active Directory to provide authentication. I hadn’t seen this as a big gap with my customers, however it came up a few times at the booth last week, and Microsoft is researching options in this space.
The other limitation Managed Instance currently faces, is that there is a maximum of 8 TB database total size currently. The Managed Instance team is in the process of implementing hyperscale, and this limit will go away when that happens, which will likely be sometime next year.
So, I’ve almost written 2000 words on this. So therefore it needs a tl;dr
Azure SQL Database—Use It If
- You are building a new cloud based application
- You want to build a reporting option for field users
- You are building a customer facing data platform and you want simplified networking and auth
Azure SQL Database Managed Instance—Use it if
- You are migrating 3rd party ISV apps to a PaaS model
- You want a PaaS model, but don’t want to change any of your code
- Your app depends on SQL Agent, or cross-database queries
SQL Server in a VM—Use It If
- you are stuck on an old version of SQL Server
- You have to use on-prem Active Directory for authentication
- 9% availability is good enough for your application
- You are one of those people who run SSAS/SSIS/SSRS/R Services/RDBMS/DQS/MDS all on the same server to maximize every penny of your licensing
- You are the person who uses Filestream