Microsoft SQL Server has a much-maligned performance monitoring (perfmon) counter called Page Life Expectancy (PLE). What this counter monitors is how long SQL Server will expect to keep your data in memory.
Now it’s true that looking at this monitor right after the SQL Server has been restarted is kind of pointless. This is because when SQL Server starts up, the PLE should increase one value for every second that passes until the cache is completely full. This what we would expect it to do, as the PLE counter doesn’t know anything from before SQL Server was started. And since it has no historical information to work off of. So given that, it knows that SQL Server was started 10 seconds ago, so the PLE needs to be 10 seconds. One second after that, it’ll be 11 seconds, because again that’s all the information that the SQL Server instance has to work with. So when should we be looking at PLE?
The first time that we want to be monitoring PLE is to get a baseline for what we expect this counter should be reporting. If we don’t know what the baseline data is then we don’t know if a problem happens when the value changes.
When the value changes dramatically
The PLE value doesn’t have a target number that we watch for. What we are looking for with changes to PLE are changes to the baseline (see above). When PLE drastically changes is that drop normal, or is that something that is out of the ordinary? If PLE for the server is normally 5,000 and it drops to 65, is that drop normal or is it something that we need to look at?
One of the things we can do with PLE is look for trends. If the PLE drops every morning at 3am, then we need to see what’s happening at 3am that is causing the SQL Serer to flush the buffer pool each night at 3am.
NUMA Node Imbalance
PLE is tracked at the instance level, but this is just the result of a math formula. The actual raw PLE number is gathered at the NUMA node level. If we look at the NUMA node level PLE values and compare them to each other we can see potential imbalances between the NUMA nodes, and we can dive into those imbalances and see why SQL Server is doing that. Maybe we have a server configuration issue that needs to be addressed? Maybe we need to change the soft-NUMA settings for the server?
PLE Doesn’t Give Answers
As we’ve seen PLE doesn’t give answers. Instead, PLE points you in a direction and is one of the diagnostic tools that can be used to see that there is a problem that needs to be investigated and identified. PLE doesn’t specifically tell you what the problem is. It tells you that there is a problem and that the problem needs to be investigated.
This is clickbait post title, sorry. You are here now. The correct answer is that you should purchase MySQL as a database service from your favorite cloud provider (Google, Amazon, and Azure all offer prebuilt database as a service offerings) because they have gone through the trouble of making their solution highly available. I’ll speak to Azure, because I’m familiar with the platform–Microsoft doesn’t employ MySQL High Availability per se, however both the storage and the VM are highly available. If there is a disk failure, the service is not impacted, and if there is a failure in the compute tier, a new VM is provisioned.
My second recommendation, if you really, really want to build your own thing is to build a Windows Server Failover Cluster, and used shared storage. Make the MySQL service a clustered resource, and assign a floating IP to the service that will fail with it. (Yes, I know you have to pay M$ for the Windows Server licenses).
Why shouldn’t you use an open source solution to make your MySQL database highly available? First let’s look at a picture of a common MySQL high availability architecture:
If we think about what we need a clustering solution to provide it comes down to a few things:
Providing a floating IP address to allow connection to the primary
Check the health of the database services and initiate a failover in the event one of them isn’t healthy
Executing a clean database failover and providing the ability to easily fail back
Ensuring the stability of the overall cluster, maintaining quorum, and avoiding split brain scenarios
If you are using a shared storage scenario, the clustering solution needs to manage the shared storage to coordinate failover with services.
If you are using SQL Server with Windows Server Failover Clustering, the cluster service takes care of all of the above, and more. When you look to do this on Linux for MySQL that there about 10 different sets of components you can use to make the service highly available. At the basis of all of these solutions is MySQL replication it’s pretty trivial transactional replication. MySQL’s replication service is fairly robust, and the GTID implementation is pretty solid.
The problem is that the rest of the components are all mix and match. You could use Haproxy to float the IP address, but there’s no way to do a smart health check of the database. It simply does a port connection test. Which means, if your primary goes away, and then comes back without some advanced configuration your floating IP is going to fail back to the original primary whether it’s actually the primary in your replication pair. This is but one example–you are going to end up with 3 or 4 different components to execute each of these functions, and congratulations you are in charge of a complex distributed system that you are responsible for administering for the rest of your life.
But Joey, Facebook/Google/Pick You Other Favorite online megacorp run MySQL and they support it with 5 9s. Ok, sure, I don’t disagree with this–and as databases, MySQL and PostgreSQL are generally ok. But look around at your engineering staff–wait do you have engineering staff? If you don’t have a few people who have both really good Linux SA skills and DBA skills, you are going to be pretty quickly in situation where support is a challenge.
Finally, consider if you need an HA solution. Are you running on a virtual machine? As long as your infrastructure is solid, that probably gets you to about 99.5% availability on a bad week. What you absolutely want to avoid is the Windows 2000 paradigm, which is where your high availability solution incurs more downtime than a standalone system.
SQL Server 2017 (compatibility 140) brought us many Intelligent Query Processing (IQP), formally known as Adaptive Query Processing, features that improve performance on workloads straight out of the box with no code changes to implement. One of those features introduced was Adaptive Joins. In this feature the join operator is dynamically determined at runtime by the optimizer, which defines a threshold number of rows and then chooses between a Nested Loop or Hash Match join operator. This operator switch can produce a better join and improve performance of your queries without you having to lift a finger.
Not all queries will qualify for this new feature. The feature only applies to SELECT statements that would have normally returned a Nested Loop or Hash Match, no other joins are apply. In addition, the query must be run in Batch mode (using a Columnstore Index in the query) or using the SQL Server 2019 Batch Mode on Rowstore feature. To find out more about the latter, I recently blogged about Batch Mode on Rowstore here.
Now let us understand the difference between the two different join operators the optimizer will choose from in the feature.
Hash Match– Creates a hash table (in memory) for required columns for each row then creates a hash for second table and finds matches on each row. It is very expensive and requires a lot of memory resources.
Nested Loop– It performs a search on the inner (smaller) table for each row of the outer (larger) table. Less expensive than a Hash Match and ideal for small row inputs, it is the fastest join operator that requires the least I/O with the fewest rows having to be compared.
For this feature a new operator was introduced to show us that an Adaptive Join was used, and the properties give us details on how it determined which join to use.
At runtime if the row count is smaller than the Adaptive Threshold of rows a Nested Loop will be chosen. If it is larger than the threshold it will choose a Hash Match, it is that simple. This can be great for workloads the fluctuate between small and large row inputs for the same query. Note the screen shot below. Using estimations, the plan would have returned a Hash Match but during actual executions it dynamically changed to Nested loop.
The adaptive threshold rows is determined is based on operator cost. The optimizer will evaluate each operator cost using an algorithm for the join operation. Where that cost intersects (the row count tipping point) is what it uses to determine the threshold. Microsoft Docs gives us a good image of this.
Like with any SQL Server feature you have the ability to turn it off by disabling it if you find it is not providing any performance gains or is causing query regressions within your environments.
— SQL Server 2017
ALTER DATABASE SCOPED CONFIGURATION SET DISABLE_BATCH_MODE_ADAPTIVE_JOINS = ON;
— Azure SQL Database, SQL Server 2019 and higher
ALTER DATABASE SCOPED CONFIGURATION SET BATCH_MODE_ADAPTIVE_JOINS = OFF;
Intelligent Query Processing gives us many new “auto” fixes for our queries Adaptive Joins is one that has piqued my interest as a database administrator that loves performance tuning. Improper JOIN choices made by the optimizer can really hinder performance, which would likely require me to implement query hints or plan guides. I really like that SQL Server is now making automatic intelligent decision and fixing them for me on the fly without my intervention.
Sometimes as a DBA, I am lazy and want the ability to execute all of my tasks in one place. Lucky for me I discovered the other day that I can change my Azure SQL Database Service Level Object options within SQL Server Management Studio (SSMS) without ever having to go to the Azure Portal. By right clicking on your database properties and choosing the Configure SQL page you can change your Subscription and Azure Location. Even more and what is really cool is I can also scale up and down my Edition (which is generally referred to as Service Tier outside of SSMS), Size and Storage Tiers as well.
You will have to authenticate to Azure the first time you use this.
Once authenticated you will see the below options available.
Depending on your Service Tier such as Basic or Premium, or the more current General Purpose or Business Critical, you have selected your drop will dynamically change for each option. As you can see in the example above it is currently set to Premium, thus I would see only the “P” level objectives (compute and memory levels). However, if I changed my Editions to anything else my Service Level Objective would change accordingly. Azure changes frequently as do the offerings. SSMS is making a call to the SQL resource provider in Azure to get the offerings, so it should always be current, though it may look different than this screenshot. If you notice that the option, you want it not in the dropdown they have given you the ability to simply type the value.
The Max Size will allow you to see your current dataset maximum storage size or scale it up and down when needed. Leaving it blank will set it to the default size for the edition and service level objective.
By clicking ok these changes will be implement and make take a slight downtime (should be minimal) event so be careful. Another thing worth noting is permissions to alter a database per ms docs a login must be either:
the server-level principal login
a member of the dbmanager database role in master
a member of the db_owner database role in the current database, or dbo of the database
If you are GUI averse, you can also script these changes out to T-SQL and run those. For example, this script changes the Service Level Objective (SLO) to a Premium P2.
ALTER DATABASE [YOUR DATABASE] MODIFY (SERVICE_OBJECTIVE = 'P2');
Once again Microsoft has catered to lazy DBAs like me that want everything in one place. This is another one of those things I am grateful to have the ability to do in SSMS.
As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.
And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.