Key Lookups are one of my least favorite SQL Server execution plan operators. This is where for each record in an index seek, SQL Server goes back to the clustered index and looks up a record. Generally, we either live with this condition (for a very small number of rows) or we fix it by adding columns directly or adding included columns to our nonclustered index.
However, in the plan below, we have a different scenario. We have a clustered columnstore index, that has an additional nonclustered index on the table. This was a feature that was added in SQL Server 2016 to allow point lookups on a column without having to scan many row segments of the index. This works pretty well for some conditions, though it is important to know that it can slow down your inserts significantly.
In the last year or so, with a large customer who makes fairly heavy use of this pattern, I’ve noticed another concern. Sometimes, and I can’t figure out what exactly triggers it, the execution plan generated, will do a seek against the nonclustered index and then do a key lookup against the columnstore as seen below. This is bad for two reasons–first the key lookup is super expensive, and generally columnstores are very large, secondly this key lookup is in row execution mode rather than batch and drops the rest of the execution plan into row mode, thus slowing the query down even further.
I’ve anonymized the schema, but that key lookup is against my clustered columnstore index. This query never finished. So what’s the resolution? Index hints–I’m not a big fan of using hints, but sometimes you need to kick the query optimizer in the shin. When I changed this query with the index hint, it completed in around 7 seconds. The ultimate fix is for Microsoft to fix this costing, but that’s hard. You can vote on my User Voice item here:
If you see this pattern pop up at all, you will definitely want to hint your queries.
In public cloud computing, the notion of platform as a service, is an offering that offers some key computing concept, as a service. Since you are reading my blog, you are probably most familiar with Azure SQL Database (which if you are old like me, you might call SQL Azure). Azure SQL Database for the uninitiated is Microsoft’s database as a service offering. You make a few mouse clicks or run a few lines of code, pay some money to Microsoft, and you have a database for your application. Microsoft handles your software installation, your settings, backups and restores, and numerous other operational tasks and lets you write your code.
Azure SQL Database’s original pricing model simply referred to a number called database throughput/transaction (Microsoft really likes to rename things if you haven’t noticed) units, or DTUs, which are a blended measure of CPU, memory, and read/write IOPs. A pricing model like this allows the cloud provider to abstract the underlying hardware, which is another simplification of the cloud model. However, especially in the early days of the public cloud, this concept was hard for IT orgs (many, at the time were still buying physical hardware) to grasp.
Different PaaS offerings offer different levels of abstraction, though they do have a few things in common. You are almost never going to install software—patching and installation are nearly always handled by the vendor. However, with some different offerings you may be doing things like configuring storage and networks—a good example of this is with the new SQL Server offering—Managed Instance. This excellent post from the SQLCAT team shows the complexity of sizing data files to get maximum performance out of the underlying storage. Doing this requires a really firm understanding of Azure storage and how it works on the platform. You also need to know a little bit about virtual networking, because you will need to get your managed instance “wired” into a network where it can talk to your application.
Another difference with Managed Instance is that, you choose a tier, General Purpose, or Business Critical (which affects your storage performance and availability) and then the number of cores you would like, and the generation of hardware you would like. While Azure SQL Database is also moving to this model, it is just noteworthy to see the difference in abstraction. I like to think of this as a curve that has pure infrastructure as a service (IaaS) on one side (VMs in the cloud) and software as a service (SaaS) like Office 365 on the other side. Managed instance is definitely closer to the IaaS side of things.
Another thing you want to verify is, the backup situation for your chosen service. While most services do backups (and you may not even have the option to do your own backups, like in Azure SQL DB), it is important to know what your data loss SLA and what your options are if there’s an outage or a disaster.
I was inspired to write this post, because while delivering training last week, I deployed a new Azure Databricks cluster, and I was totally blown away at how simple the deployment process was—you had two performance options (for advanced deployments you could choose more changes), but it was dead simple from a deployment perspective. I don’t think it’s a good thing or a bad thing that different services have different levels of configuration—I think in some services you need that, but sometimes it’s nice just to have a box that does a service for you.
One of the unique things about managing SQL Server on Azure VMs is that we use the local D: drive for TempDB. The D: drive (or /dev/sdb1 for those of you running on Linux) is a direct-attached solid state drive (on nearly all VM tiers) which offers the lowest latency storage in Azure, making it ideal for TempDB’s busy workload. There is only one catch–that temporary volume in Azure is ephemeral, meaning the data gets wiped whenever your VM is rebooted.
You may think this isn’t a big deal–TempDB gets recreated every time your instance restarts, so who cares if the files get wiped? Well, it’s not so much the files as the directory that the files live in. You could just place your files at the root of the D:\ drive, however that would require your SQL Server service start as admin. Since we like to follow security best practices, we aren’t going to do that. I usually follow this process as defined by Jim Donahoe b|t) in this post.
I was teaching Azure infrastructure last week, and decided that it might be a good idea to do this using Desired State Configuration (DSC) which is part of automation. However, first I had to code it up in PowerShell. You’ll note I’m also calling Dbatools which greatly simplifies my TempDB config. By using Set-DBATempDBConfig I just need to pass in the volume size, which I can get from WMI–I’m allocating 80% of the volume to TempDB (I use 80% because it’s below the cutoff of most monitoring tools) and then the script does the rest.
You can import PowerShell modules into Azure Automation now–this is a relatively recent change. I don’t have this fully baked into DSC yet, but you can see the PowerShell to create the PowerShell script (yes, read that correctly) and the scheduled task in Github.
I’d welcome any feedback and I will add a new post when I finish the DSC piece of this.
A couple of weeks ago, my friend and colleague Matthew Roche (t), put out a poll about people’s feelings about vendors who don’t publicly advertise the costs associated with their products. In the past, in a disconnected world, when most of your customers had sales reps, and being able to find all of your competitors pricing wasn’t possible, abstracting prices to minimize comparisons made business sense. In an interconnected world, especially given the nature of cloud computing pricing (usually straightforward storage+compute costs), abstracting or hiding your prices just annoys potential customers.
In my roles where I’ve had to set budget, I’ve frequently been frustrated, especially with hardware vendors, in their inability to get me a rough idea of their pricing. I’m not asking for a quote–I understand discounts, volume pricing, etc. I just want to know the number of zeroes in your product costs. I understand different customers have different costs, but make your pricing clear, so I can understand it. As much as I’ve bagged on Oracle for their “aggressive” sales tactics, I’ve always been able to go to their website and get full list of retail pricing for their products. And I applaud them for that.
Why am I writing about this now? I mostly thought the IT industry was beyond this type of pricing, especially given the move towards cloud computing, pricing has become much more transparent. However, last week I was at a very large customer, and we were talking to a vendor who had a very promising technology. The vendor couldn’t give the customer any semblance of pricing–this was a two-fold problem. Their licensing model is so abstracted that I’ve got no idea what I was buying (and you thought DTUs were bad), and secondly, the sales rep could barely give us any idea of what our monthly spend would be for the product.
Part of the modern, connected world is that consumers can quickly comparison shop. For example, 80% of retail customers check pricing online when they are in a store (Regalado, Antonio. “It’s All E-Commerce Now.” Technology Review, November 4, 2013. http://www.technologyreview.com/news/520786/its-all-e-commerce-now/). Enterprise customers are doing the same thing–while that may be frustrating to old school salespeople, it’s the way of life now.
Make your pricing as transparent as possible–if your prices may fluctuate due to other factors like cloud pricing, put in a footnote in your price list, but still make it clear. Your sales message will be all the better.