Building a Data as a Service Platform on Azure SQL Database

One of the benefits of cloud computing is flexibility and scale—I don’t need to procure hardware or licenses as you get new customers. This flexibility and platform as a service offerings like Azure SQL Database allow a lot of flexibility in what independent software vendors or companies selling access can provide to their customers. However, there is a lot of work and thought that goes into it. We have had success with building out these solutions with customers at DCAC, so in this post, I’ll cover at high level some of the architectural tenants we have implemented.

Authentication and Costing

The cloud has the benefit of providing detailed billing information, so you know exactly what everything cots. The downside to this is that the database provided is very granular and detailed and can be challenging to breakdown. There are a couple of options here—you can create a new subscription for each of your customers which means you will have a single bill for each customer, or you can place each of your customers into their own resource, and use tags to identity which customer is associated with that resource group. The tags are in your Azure bill and this allows you to break down your bill by each customer. While the subscription model in cleaner in terms of billing, however it adds additional complexity to the deployment model and ultimately doesn’t scale.

The other thing you need to think about is authenticating users and security. Fortunately, Microsoft has built a solution for this with Azure Active Directory, however you still need to think about this. Let’s assume your company is called Contoso, and your AAD domain is contoso.com. Assuming you are using AAD for your own business’s users, you don’t want to include your customers in that same AAD. The best approach to this is to create a new Azure Active Directory tenant for your customer facing resources—in this case called cust.contoso.com. You would then add all of the required accounts from contoso.com to cust.contoso.com in order to manage the customer tenant. You may also need to create a few accounts in the target tenant, as there are a couple of Azure operations that require an admin from home tenant.

black cassette tape on top of red and yellow surface
Photo by Stas Knop on Pexels.com

Deployment of Resources

One of the things you need to think about is what happens when you onboard a new customer. This can mean creating a new resource group, a logical SQL Server, and a database. In our case, it also means enabling a firewall rule, and enabling performance data collection for the database, and a number of other configuration items. There are a few ways you can do this—you can use an Azure Resource Manager (ARM) template, which contains all of your resource information, which is a good approach that I would typically recommend. In my case, there were some things that I couldn’t do in the ARM template, so I resorted to using PowerShell and Azure Automation to perform deployments. Currently our deployment is semi-manual as someone manually enters the parameters into the Azure Automation runbook, but it could be easily converted to be driven by an Azure Logic App, or a function.

Deployment of Data and Data Structures

When you are dealing with multiple databases across many customers, you desperately want to avoid schema drift that can happen.  This means having a single project for all of your databases. If you have to add a one-off table for a customer, you should still include it in all of your databases. If you are pushing data into your tables (as opposed the data being entered by the application or users) you should drive that process from a central table (more to come about this later).

Where this gets dicey is with indexes, as you have may have some indexes that are needed for specific customer queries. In general, I say the overhead on write performance of having some additional indexes is worth the potential benefit on reads. How you manage this is going to depend on the number of customer databases you are managing—if you are you have ten databases, you might be able to manage each databases indexes by themselves. However, as you scale to a larger number of databases, you aren’t going to be able to manage this by hand, Azure SQL can add and drop indexes it sees fit, which can help with this, but isn’t a complete solution.

Hub Database and Performance Data Warehoue

Even if you aren’t using a hub and spoke model for deploying your data, having a centralized data repository for metadata about your client databases. One of the things that is a common task is collecting performance data across your entire environment. While you can use Azure SQL Diagnostics to capture a whole lot of performance information in your environment, with one of our clients we’ve taken a more comprehensive approach combining the performance data from Log Analytics, Audit data that also goes to Log Analytics, and the Query Store data from each database. While log analytics contains data from the Query Store, there was some additional metadata that we wanted to capture that we could only get from the Query Store directly. We use Azure Data Factory packages (which were built by my co-worker Meagan Longoria (b|t) to a SQL Database that serves as a data warehouse for that data. I’ve even built some xQuery to do some parsing of execution plans, to identity which tables are most frequently queried. You may not need this level of performance granularity, but it is a talk you should have very early in your design phase. You can also use a 3rd party vendor tool for this—but the costs may not scale if your environment grows to be very large. I’m going to do a webinar on that in a month or so–I need to work it out the details, but stay tuned.

Final Things

You want to have the ability to quickly do something across your environment, so having some PowerShell that can loop through all of your databases is really powerful. This code allows you to make configuration changes across your environment, or if you use dbatools or invoke-sqlcmd to run a query everywhere. You also probably need to get pretty comfortable with Azure PowerShell, as you don’t want to have to change something in the Azure Portal across 30+ databases.

Share

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?