Keep It Simple Stupid

Published On: 2019-07-24By:

Some of us take days and weeks to write a blog, while others can take two minutes and write a great blog *cough* *cough* Denny (B|T). Why is that? I think it’s simple, we are thinking to hard about it and over complicating things. Just sit down at your computer and write what’s on your mind. It doesn’t have to be rocket science or something completely mind blowing or life changing. I’ve seen several, as of late, crowd source their next topic, I think this a great avenue to take as it is those people that are likely your readers. There’s nothing wrong with asking others what you should write about, as a matter of fact you might be helping them solve their latest issue.

Anyhow, we made comment today in our team’s chat at DCAC as to why Denny & Joey (B|T) can write blogs and articles so easily, while some of us have must ourselves to get one out as often as we can.

I think the answer is KISS… Keep It Simple Stupid.

Stop overthinking and just write.

I challenge all those bloggers out there, especially those struggling to write their next blog, to set a timer for 5 minutes and see what you come up with. I’d love to read them.

NOTE: This blog was written with a time of 5 mins with 227 words.


Contact the Author | Contact DCAC

Understanding Columnstore Indexes in SQL Server Part 3

Published On: 2019-07-17By:

My last two blogs were on understanding columnstore and creating columnstore indexes. In this post, I will touch on one last topic, when to use columnstore indexes. As noted in my prior posts these indexes are designed for large data warehouse workloads, not normal OLTP workload tables. As with any index design it is important to know your data and what you will be using the data for. Be sure to look to what types of queries you will be running against it, as well as how the data is loaded and maintained. Let’s look at a few questions you should ask yourself before deciding to implement a columnstore index. Just because your table consist of millions of rows doesn’t mean columnstore is the right way to go.

First you need to know the data you are designing the index for.

Is your table large enough to benefit? As we reviewed in my first post usually this means in the millions of records range as these rows are divided into groups of rows, called a rowgroup.  A rowgroup has a minimum of 102,400 rows with a max of approximately 1 million rows.  Each rowgroup is changed into column segments. Therefore, having a columnstore index on a table with under a 1 million rows does not make sense in that if the table is too small you don’t get the benefits of compression that comes with the column segments. A general recommendation is to use columnstore indexes on the fact tables in your data warehouse, and on very large dimension tables, containing over 5 million rows.

Is your data volatile, meaning changing frequently? Rule of thumb says you want tables that rarely have data modifications, more specifically, where less then 10% of the rows are modified. Having large numbers of deletes can cause fragmentation, which adversely affect compression rates, thus reducing the efficiency of the index. Updates in particular are expensive, as they are processed as deletes followed by inserts, which will adversely affect the performance of your loading process.

What datatypes are in your table? There are several data types that are not supported within a columnstore index. Data types like varchar(max), nvarchar(max), or varbinary(max) were not supported until SQL Server 2017, and typically aren’t the best fit for this type of workload. Additionally, if you are using uniqueidentifiers (GUIDs) you won’t be able to create your index as they are still not supported.

Next what are you doing in your queries.

Are you doing aggregations or performing analytics on the data or are you looking for specific values? The standard B-tree rowstore indexes are best for singleton lookups (single values) and are sometimes used in tandem with a columnstore index. If you’re using an index to cover a where clause that does not look up a range of values and is just filling predicates, then columnstore does not benefit. This especially  true if you need to “cover” that query with included columns since columnstore does not allow included columns. However, columnstore is designed to quickly process aggressions especially on a grouped range of values. So, if you are performing aggregations or analytics usually columnstore can give you large performance gains as it can do full table scans to perform aggregations very fast.

Now there are times where you want to seek a specific value and perform aggregations (think average sale price in the last quarter for a specific product). In these cases you may benefit from a combination of a rowstore and columnstore index. Creating the columnstore to handle the grouping and aggressions and covering the index seek requirement with the rowstore. Adding these b-tree indexes will help your query performance, but they can dramatically impact the loading process—if your data loads are large enough it may be more efficient to drop the b-tree index, and rebuild it after loading the data into the columnstore.

Is this a data warehouse fact or dimension table? As we know a dimension table is usually used to find specific values, mostly lookups to match with an aggregated value from a fact table. If it’s a dimension table, typically you are going to use a b-tree based models, with the exception of very large dimensions. We tend to use columnstore on fact tables in the data warehouse as these tables are normally the source of aggregations.

Summary

Knowing your data is pivotal in deciding whether or not a columnstore index is best suited for your query needs. Don’t just jump to columnstore indexes now that you have an understanding of them, and your tables are large.  Make sure to take the time to choose the right indexing option for your usage patterns, the data and the overall environment.


Contact the Author | Contact DCAC

Understanding Columnstore Indexes in SQL Server Part 3

Published On: By:

My last two blogs were on understanding columnstore and creating columnstore indexes. In this post, I will touch on one last topic, when to use columnstore indexes. As noted in my prior posts these indexes are designed for large data warehouse workloads, not normal OLTP workload tables. As with any index design it is important to know your data and what you will be using the data for. Be sure to look to what types of queries you will be running against it, as well as how the data is loaded and maintained. Let’s look at a few questions you should ask yourself before deciding to implement a columnstore index. Just because your table consist of millions of rows doesn’t mean columnstore is the right way to go.

First you need to know the data you are designing the index for.

Is your table large enough to benefit? As we reviewed in my first post usually this means in the millions of records range as these rows are divided into groups of rows, called a rowgroup.  A rowgroup has a minimum of 102,400 rows with a max of approximately 1 million rows.  Each rowgroup is changed into column segments. Therefore, having a columnstore index on a table with under a 1 million rows does not make sense in that if the table is too small you don’t get the benefits of compression that comes with the column segments. A general recommendation is to use columnstore indexes on the fact tables in your data warehouse, and on very large dimension tables, containing over 5 million rows.

Is your data volatile, meaning changing frequently? Rule of thumb says you want tables that rarely have data modifications, more specifically, where less then 10% of the rows are modified. Having large numbers of deletes can cause fragmentation, which adversely affect compression rates, thus reducing the efficiency of the index. Updates in particular are expensive, as they are processed as deletes followed by inserts, which will adversely affect the performance of your loading process.

What datatypes are in your table? There are several data types that are not supported within a columnstore index. Data types like varchar(max), nvarchar(max), or varbinary(max) were not supported until SQL Server 2017, and typically aren’t the best fit for this type of workload. Additionally, if you are using uniqueidentifiers (GUIDs) you won’t be able to create your index as they are still not supported.

Next what are you doing in your queries.

Are you doing aggregations or performing analytics on the data or are you looking for specific values? The standard B-tree rowstore indexes are best for singleton lookups (single values) and are sometimes used in tandem with a columnstore index. If you’re using an index to cover a where clause that does not look up a range of values and is just filling predicates, then columnstore does not benefit. This especially  true if you need to “cover” that query with included columns since columnstore does not allow included columns. However, columnstore is designed to quickly process aggressions especially on a grouped range of values. So, if you are performing aggregations or analytics usually columnstore can give you large performance gains as it can do full table scans to perform aggregations very fast.

Now there are times where you want to seek a specific value and perform aggregations (think average sale price in the last quarter for a specific product). In these cases you may benefit from a combination of a rowstore and columnstore index. Creating the columnstore to handle the grouping and aggressions and covering the index seek requirement with the rowstore. Adding these b-tree indexes will help your query performance, but they can dramatically impact the loading process—if your data loads are large enough it may be more efficient to drop the b-tree index, and rebuild it after loading the data into the columnstore.

Is this a data warehouse fact or dimension table? As we know a dimension table is usually used to find specific values, mostly lookups to match with an aggregated value from a fact table. If it’s a dimension table, typically you are going to use a b-tree based models, with the exception of very large dimensions. We tend to use columnstore on fact tables in the data warehouse as these tables are normally the source of aggregations.

Summary

Knowing your data is pivotal in deciding whether or not a columnstore index is best suited for your query needs. Don’t just jump to columnstore indexes now that you have an understanding of them, and your tables are large.  Make sure to take the time to choose the right indexing option for your usage patterns, the data and the overall environment.


Contact the Author | Contact DCAC

Understanding Columnstore Indexes in SQL Server Part 2

Published On: 2019-07-10By:

In my last blog I explained what a columnstore index is, in this blog, we will dive into creating a clustered columnstore index and look at the performance differences the index can make.

Let’s get started.

Using AdventureworksDW2016CTP3 we will work with the FactResellerSalesXL table which has 11.6 million rows in it. The simple query we will use as a demo just selects the ProductKey and returns some aggregations grouping them by the different product keys.

First, we will run the query with no existing columnstore index and only using the current clustered rowstore (normal) index. Note that I turned on SET STATISTICS IO and TIME on. These two SET statements will help us better illustrate the improvements provided by the columnstore index. SET STATISTICS IO displays statistics on the amount of page activity generated by the query. It gives you important details such as page logical reads, physical reads, scans, and lob reads both physical and logical. SET STATISTICS TIME displays the amount of time needed to parse, compile, and execute each statement in the query. The output shows the time in milliseconds for each operation to complete. This allows you to really see, in numbers, the differences.

USE [AdventureworksDW2016CTP3]
GO
SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO 
SELECT ProductKey, sum(SalesAmount) SalesAmount, sum(OrderQuantity) ct
FROM dbo.FactResellerSalesXL
GROUP BY ProductKey

Looking at the results below, it completed five scans, 318,076 logical reads, two physical reads and read-aheads 291,341. It also shows us a CPU Time of 8233 milliseconds (ms) and elapse time of 5008 ms. The optimizer choose to scan the existing rowstore clustered index with a cost of 91% and scanned the entire 11.6 million records to return the 395 record result set.

Another thing worth noting is if you hover over the Clustered Index scan you can see that the storage of this index is Row and the Actual Execution Mode is also Row.

Now let’s create the Columnstore index on this table.

Using the GUI right click on indexes and choose New Index then Clustered Columnstore Index.

Under the General table all you need to do is name the index. If there are any objects in the table that is not compatible you will see them listed in the highlighted section below. There are many limitations and exceptions to columnstore indexes such as certain data types like text, ntext and image, and features like sparse columns. To best see the full list of limitations take a look at the docs Microsoft gives us here.

Because of the compression involved, creating a columnstore index can be very CPU resource intensive. To help mitigate that SQL Server gives us an option, under the Options tab, to overwrite the current MaxDop server setting for the parallel build process. This is definitely something you want to consider while creating columnstore index in a production environment. For this example, we will leave as default. On the other hand, if you are building this index during downtime, you should note that columnstore operations scale linearly in performance all the way up to a MaxDOP of 64, and this can help the index build process finish faster at the expense of overall concurrency.

Per docs.microsoft

max_degree_of_parallelism values can be:

·         1 – Suppress parallel plan generation.

·         >1 – Restrict the maximum number of processors used in a parallel index operation to the specified number or fewer based on the current system workload. For example, when MAXDOP = 4, the number of processors used is 4 or less.

·         0 (default) – Use the actual number of processors or fewer based on the current system workload.

If you choose to script this out, you will get the below T SQL create statement.

USE [AdventureworksDW2016CTP3]
GO
CREATE CLUSTERED COLUMNSTORE INDEX [CS_IDX_FactResellerSalesXL] 
ON [dbo].[FactResellerSalesXL] WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0)
GO

When you run this statement, you will get an error which states the index could not be created because you cannot create more than one clustered index on a table at a time.

I did this on purpose to illustrate that you can only have one clustered index on a table regardless if you have columnstore or row store indexes on the table. You can change the DROP_EXISTING = ON to remove the row store clustered index and replace it with the columnstore. Moreover, you have the option to create a non-clustered columnstore index as well. This option is typically used when most of the queries against a table return large aggregations, but another subset does a lookup by a specific value. Adding additional non-clustered index will dramatically increase your data loading times for the table.

However, to keep things simple you will note that the AdventureWorksDW216 database also has a table called dbo.FactResellerSalesXL_CCI which already has a clustered columnstore index created. By scripting that out you can see it looks exactly like the one we tried to create. So, let’s use this table which is identical to the FactResellerSalesXL table minus the columnstore index difference.

USE [AdventureworksDW2016CTP3]
GO
CREATE CLUSTERED COLUMNSTORE INDEX [IndFactResellerSalesXL_CCI] 
ON [dbo].[FactResellerSalesXL_CCI] WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
GO

Now getting back to the originally query. Let’s run the exact same statement again the columnstore indexed table that again has the 11.6 million rows.

USE AdventureWorksDW2016CTP3; 
GO
SET STATISTICS IO ON
GO 
SET STATISTICS TIME ON
GO 
SELECT ProductKey, sum(SalesAmount) SalesAmount, sum(OrderQuantity) ct
FROM dbo.FactResellerSalesXL_CCI
GROUP BY ProductKey

Taking a look at the execution plan first. Easily we can see the optimizer went from eight operators to only five operators to complete the transaction and you can see it did an index scan of the clustered columnstore index but this time at a cost of only 36% and read zero of the 11.6 million records.

Next let’s look at the numbers. To see the difference clearer, I have both results for comparison below.

ROW STORE

 

COLUMNSTORE

The columnstore results have a scan count of four, zero logical reads, zero physical reads, zero read-aheads and  22486 lob read-aheads and took less than a second to run. The reason why these lob activities are in the output, is that SQL Server uses its native lob storage engine for the storage of the columnstore segments. There is also an additional cache for columnstore segments in SQL Server’s main memory, that is separate from the buffer pool. The rowstore index show significant more reads. Lastly note the additional table line, you will see Segments Reads =12. Remember in my last post I discussed how columns are now stored in column segments. This is where we see that the optimizer read those segments.

You can also see that the columnstore indexed results took less time CPU time. The first one was 8233 ms with the elapse time of 5008 ms while the second only took a CPU time 391 ms and elapse time of 442ms. That is a HUGE gain in performance.

Remember earlier that when we used the rowstore index the Actual Execution Mode was Row, here using columnstore, it used Batch mode (boxed in red below). If you recall from my part 1 blog batch mode allows the engine to process multiple rows at one time. This also makes the engine able to process rows extremely fast in some cases, giving two to four times the performance of a single query execution. As you can see our aggregation example happened very quickly because only the row being aggregated is read into memory and using the row groups the engine can batch process the groups of 1 million rows, thus we saw 12 segments read a little over 11.6 million rows.

Now with great power, comes great responsibility. Columnstore indexes are designed for large data warehouse workloads, not normal OLTP workload tables. You want tables that rarely have updates and or deletes.  Just because these indexes work really efficiently doesn’t mean you should add them, be sure to research and test before introducing columnstore indexes into your environments.


Contact the Author | Contact DCAC
1 2 3 21

Video

Globally Recognized Expertise

As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.

And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.

Awards & Certifications

Microsoft Partner    Microsoft MVP    Microsoft Certified Master    VMWare Partner    VMWare vExpert
   Best Full-Service Cloud Technology Consulting Company    Insights Sccess Award    Technology Headlines Award    Golden Bridge Gold Award    CIO Review Top 20 Azure Solutions Providers