Understanding Columnstore Indexes in SQL Server Part 1

Published On: 2019-06-26By:

Recently I reviewed filtered indexes, this time let’s look at columnstore indexes. These indexes are very useful for data warehouse workloads and large tables. They can improve query performance by a factor of 10 in some cases, so knowing and understanding how they work is important if you work in an environment with larger scaled data.

Now, I admit when these first were introduced in SQL Server 2012 I found them very intimidating (additionally, you couldn’t update them directly). For me, anytime you say columnstore, my mind tends to set off alarms saying wait stay away, this is too complicated. So, in this post I am going to try and simplify the feature for you.

To do that first you need to understand some terminology and the difference between a columnstore index and a row store index (the normal kind we all use). Let’s start with the terminology.

Columnstore is simply the way the data is stored in the index. Instead of our normal Rowstore or b-tree  indexes where the data is logically and physically organized and stored as a table with rows and column, the data is in columnstore indexes are physically stored in columns and logically organized in rows and columns. It is this difference in architecture that gives the columnstore index a very high level of compression along with reducing your storage footprint and providing massive improvements in read performance.

Now the way the index works is by slicing the data into compressible segments. It takes a group of rows,   a minimum of 102,400 rows with a max of 1 million rows, called a rowgroup and then takes that group of rows and changes them into Column segments. It’s these segments that are the basic unit of storage for a columnstore index as shown below. This to me is a little tricky to understand without a picture.

Imagine this is a table with 2.1 million rows and 6 columns. Which means we now have two rowgroups of 1 million rows each and a reminder of 100,00 rows, which is called a deltagroup. Since each rowgroup holds a minimum of 102,400 rows the delta rowgroup is used to store all index records remaining until it creates another rowgroup. You can have multiple delta rowgroups awaiting being moved to the columnstore. Multiple delta groups are stored in the delta store and it is actually a B-tree index used in addition to the columnstore. Ideally, your index will have rowgroups as close to 1 million rows as possible to reduce the overhead of scanning operations.

Now to complicate things just one step further there is a process that runs to move delta rowgroups from the delta store to the columnstore index called a tuple-mover process. This process checks for closed groups, meaning a group that has the maximum of 1 million records and is ready to be compressed and added to the index.  As illustrated in the picture the columnstore index now has 2 rowgroups that it will then divide into column segments for every column in a table. This creates 6 pillars of 1 million rows per rowgroup for a total of 12 column segments. Make sense? It is these column segments that are compressed individually for storage on disk. The engine takes these pillars and uses them for very highly paralleled scans of the data. You can also force the tuple-mover process by doing a reorg on your columnstore index.

To facilitate faster data access, only the Min and Max values for the row group are stored on the page header. In addition, query processing, as it relates to column store uses Batch mode allowing the engine to process multiple rows at one time. This also makes the engine able to process rows extremely fast in some cases, giving two to four times the performance of a single query process. For example, if you are doing an aggregation these happen very quickly as only the row being aggregated is read into memory and using the row groups the engine can batch process  the groups of 1 million rows. In SQL Server 2019, batch mode is going to be introduced to some row store indexes and execution plans.

Now hopefully you have a basic understanding of what a columnstore index is. In my next post we will look at how to create one, load data, and what limitations using columnstore indexes have. We will also get to see the index in action compared to a rowstore index. Stay tuned.

Contact the Author | Contact DCAC

Comparing Execution Plans

Published On: 2019-06-19By:

When you run a query twice, and get dramatically different run times, your first step can be to try to identify the differences in execution plans. For many that means running  the two different queries (before & after) and splitting you screen in order to visually compare the plans. Did you know SQL Server Management Studio gives you the option to compare to different execution plans? It makes it easy not only to visualize the differences, but it also shows you detail properties that allow you to dive into the numbers. This functionality was introduced with SQL Server 2016, and is also part of the functionality of the Query Store GUI shown below.

The SQL Server Management Studio comparison process allows you to compare any two .sqlplan files including those from old SQL Server versions. In the comparison windows you will see the top plan is the first plan you open or have run and the lower will be the second plan chosen for comparison. As you can imagine this a great tool to use prior to migrations, and as an added bonus, you have the ability to compare the plans offline so no need to connect to a SQL Instance. This information can be very useful and should be added to your testing check lists.

When comparing the plans, Management Studio highlights similar query operators by shading them with the same color. The best place to start looking for differences is in the properties pane, which you can see on the right—where there’s a not equal sign (≠) you can easily see what is different in costing these two plans.

Let’s see it in action.

To compare execution plans you need to first save one of the plans you wish to compare. To save the plan just right click in the execution plan area and choosing Save Execution Plan as shown below in purple.

Once you have a plan saved, go ahead and run the plan you wish to compare to. After execution keep that window open, then right click and choose Compare Showplan and pick your saved plan.  Below example is from docs.microsoft.com. As you hover over each node you can dive into the properties and numbers on the lefthand side by default.

The properties allow you to clearly see the cost and actual row count difference between each node in each plan.

It is important to examine each of the places where the plans are different. I’ve run into situations where two plans had the exact same plan shape, and only minor differences in memory grants, but the CPU consumption by each plan was vastly different. This is the kind of scenario where the properties comparison is invaluable. The old approach was to parse the XML of each plan to get the different memory grants.

This is just a quick reminder that SSMS gives us this functionality built in. I encourage you to take it for a test drive.

Contact the Author | Contact DCAC

Filtered Index Basics

Published On: 2019-05-22By:

In this post, we continue with another beginner’s blog of database features that may be unknown to many. Let’s take a look at filtered indexes. Many database administrators are fully aware of the power of indexes and know how to create them. However, I find that some have yet to dive into fully optimizing their indexes and taking advantage of what filtered indexes has to offer.

What is a filtered index?

Simply it’s an index with a where clause. It is an optimized non clustered index that can be narrowed down in scope to better fit a subset of data. Example being date ranges, years, non NULLs or specific product types.

Benefits

Using filtered indexes can improve query performance and plan quality over full table indexes. Statistics are more accurate since they only cover the rows of the filtered index resulting in better execution plans. It can reduce your index maintenance due to the decreased index size and you only maintain the data in the index that is changed; not an entire table of data. Lastly, since it is smaller in size, it will consume less storage. Why have an index full of rows that are never accessed?

Let’s see it in action

First create one using the GUI.

In your database under tables, migrate to indexes. Right click on indexes and choose New Index then Non-Clustered Index.

Like a normal index choose your columns and included columns where needed.

Then choose FILTER and type in your expression. Here we are telling the index to filter out all NULL values. Note I did not include the word WHERE here, when you script it out the WHERE is already included.

 

USE [AdventureWorks2014]
GO
CREATE NONCLUSTERED INDEX [FIIDX_ComponentID_StartDate] 
ON [Production].[BillOfMaterials]
(       [ComponentID] ASC,       [StartDate] ASC)
WHERE EndDate IS NOT NULL
GO

 

Let’s look at a couple execution plans now to see the results. This is a common query in environments where you want to look for items that have a specific end date. Instead of reading all the rows including the NULL (active products, represented by the EndDate column) you index is already scoped down to non-active products. If we had an index just on End Date the NULL would also be included and depending on how many products exist in your table those rows could be significant.

 

USE AdventureWorks2014; 
GO 
SELECT ProductAssemblyID, ComponentID, StartDate  
FROM Production.BillOfMaterials 
WHERE EndDate IS NOT NULL      
AND ComponentID = 5      
AND StartDate > '01/01/2008' ; 
GO

If you find for some reason the optimizer is not choosing to use your filtered index you can also force its use with a query hint. Caution using query hints are not always the best course of actions, this is simply an example of what you can do.

USE AdventureWorks2014; 
GO 
SELECT ComponentID, StartDate 
FROM Production.BillOfMaterials    
 WITH ( INDEX ( FIIDX_ComponentID_StartDate ) )  
WHERE EndDate IN ('20000825', '20000908', '20000918');  
GO

 Summary

As a DBA it’s important to consider all options when creating indexes. We tend to think in terms of predicates and included column, but don’t dive deeper into actual uses of the indexes and how can we better eliminate unneeded results within our index rather than with a query where clause. If you find that your index is only needed for the current’s years data then filter it at the index level. Don’t read through or store  years of data when you don’t have to. Start taking advantage of these.

Contact the Author | Contact DCAC

DMV’s for the Beginner

Published On: 2019-05-15By:

I give performance presentations at many different events to all levels of SQL Server professionals. Over time I’ve noticed that some DBAs and developers have never looked at the dynamic management views (dmv’s)  that are available within SQL Server starting with SQL Server 2005. These DMVs are useful means to determine what is going on with a particular SQL Server instance. So, in this post I am going to list a few of my very favorite DMVs as a reference for those who may be new to them.

Performance Tuning

These dynamic management views are what I use first when looking to any performance issues or I need to know what is going on right now with my SQL Server instance. Usage examples borrowed from docs.microsoft.com.

sys.dm_exec_sessions – This returns current session information such as Session ids, login information like name, host, client info. It also returns some performance information like memory usage, cpu time, reads, writes and lots of other useful info.

SELECT login_name ,COUNT(session_id) AS session_count
FROM sys.dm_exec_sessions
GROUP BY login_name;

 

sys.dm_exec_connections This DMV gives you information on currently connected sessions to SQL Server. Information like when the connection started, last read, last write, client net address, most recent sql_handle (the token that uniquely identifies the SQL batch) and other network connection related information.

SELECT; c.session_id, c.net_transport, c.encrypt_option, c.auth_scheme, s.host_name, s.program_name, 
s.client_interface_name, s.login_name, s.nt_domain, s.nt_user_name, s.original_login_name, c.connect_time,
s.login_time
FROM sys.dm_exec_connections AS c 
JOIN sys.dm_exec_sessions AS s
ON c.session_id = s.session_id 
WHERE c.session_id = @@SPID -- @@SPID returns your current session SPID

 

sys.dm_exec_requests – This one use I a lot when looking at blocking and I don’t have sp_whosactive available. It’s great for looking into blocking, wait times and looking at what is currently running on your SQL Server. It will provide wait types, backup\restore\DBCC Checkdb percent completes, database and session information. It gives specifics like statement commands like SELECT, INSERT, UPDATE, DELETE etc as well as execution plan information with reads/writes and resource usage.

USE master
GO
SELECT * FROM sys.dm_exec_requests
WHERE session_id = 54;
GO

 

sys.dm_os_wait_stats– This gives you all the waits currently happening in your system. It gives to a great trail of what is going on to diagnose problems the system is having. It provides wait type and times as well as how many tasks are waiting.

USE master
GO
SELECT * FROM sys.dm_os_wait_stats
GO

 

sys.dm_os_performance_counters– Looking into the OS through SQL server is always handy. With this one you can get to performance monitor counters captured by SQL Server. This does not replace perfmon for detailed performance analysis, but it includes things like buffer counts, locks, file growth events, column store counters, cache hit ratios… lots of goodies in this one.

USE master
GO
SELECT * FROM sys.dm_os_performance_counters
GO

 

Query Specific Tuning

When starting to look at query tuning you need to dive into query plans statements. These are what I used to find out what queries are running and to get to their query plans.

sys.dm_exec_query_stats -This is an important one. It returns summed performance statistics for query plans that currently exist in the plan cache. It gives statistics tied to a single plan like reads, writes, total rows, time it took to complete, plan hash, memory grant info,  threads, column store stats, spills, and degree of parallelism (DOP, which was added in SQL Server 2016).

SELECT TOP 5 query_stats.query_hash AS "Query Hash",
 SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS "Avg CPU Time",
 MIN(query_stats.statement_text) AS "Statement Text" 
 FROM (SELECT QS.*, 
 SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,
 ((CASE statement_end_offset
  WHEN -1 THEN DATALENGTH(ST.text)
 ELSE QS.statement_end_offset END
 - QS.statement_start_offset)/2) + 1) AS statement_text
FROM sys.dm_exec_query_stats AS QS 
CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST) as query_stats
GROUP BY query_stats.query_hash 
ORDER BY 2 DESC;

 

sys.dm_exec_query_plan​-This gives you the actual execution plan in XML format.

USE master
GO
SELECT *FROM sys.dm_exec_query_plan (your plan handle goes here)
GO

 

sys.dm_exec_cached_plans​ – These are the plans that are currently available in your cache for reuse. You can use this to retrieve the plan handles needed to get the actual plan from sys.dm_exec_query_plan. Here you can also see the use counts of how many times that plan has been reused while in cache.

USE master; GO SELECT *FROM sys.dm_exec_cached_plans AS cpCROSS APPLY sys.dm_exec_query_plan(cp.plan_handle); GO

 

sys.dm_exec_sql_text – This returns the actual query text in an execution plan and is referred to a a dynamic management function instead of view.

— acquire sql_handle

SELECT sql_handle FROM sys.dm_exec_requests WHERE session_id = 59  — modify this value with your actual spid

 

— pass sql_handle to sys.dm_exec_sql_text

SELECT * 
FROM sys.dm_exec_sql_text(your plan handle goes here)

 

Index

sys.dm_db_index_usage_stats – Ever wonder is your index is actually getting used? This one will tell you all the scans, seeks and writes done on your index and the last time it was used.

USE master
GO
SELECT * FROM sys.dm_db_index_usage_stats

 

sys.dm_db_missing_index_details – This will give you a list of all the missing indexes in your SQL Server. This DMV should not be considered a list of indexes to add, as it will record many potentially overlapping indexes. However, this DMV gives you a great starting point into your servers indexing needs for you to dive into.

USE master
GO
SELECT * FROM sys.dm_db_missing_index_details

 

These are just a few of many that SQL Server has to offer. But if you are just starting out these are definitely DMVs you should take a look at and add to your arsenal for performance tuning and monitoring your SQL Servers. Lastly, one thing to keep in mind when reading the data from these queries. Like many other things inside SQL Server this data is a good as your last reboot or service restart.

 

Contact the Author | Contact DCAC
1 2 3 20

Video

Globally Recognized Expertise

As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.

And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.

Awards & Certifications

Microsoft Partner       Insights Sccess Award    Technology Headlines Award    Golden Bridge Gold Award    CIO Review Top 20 Azure Solutions Providers    VMWare Partner
Microsoft Certified Master    Microsoft MVP