Partitioned Tables and ColumnStore

ColumnStore indexes are all the rage with data warehouses. They’re fast, they’re new(ish) and they solve all sorts of problems when dealing with massive amounts of data.  However they can cause some issues as well if you aren’t very careful about how you setup your partitions on the ColumnStore index.  This is because, you can’t split a ColumnStore partition once it contains data.

Now, if everything is going according to plan you create your partitions well in advance and there’s no issues.

However, if everything hasn’t gone according to plan and someone forgets to create the partitions and you end up with rows in the final partition, you can’t create any more partitions because you can’t split the partition.

We can see this pretty easily with a quick repro.

CREATE PARTITION FUNCTION myRangePF1 (int)
AS RANGE LEFT FOR VALUES ( 1, 100, 1000 );
GO

create partition scheme myScheme1
as partition myRangePF1
all to ([primary])

create table t1
(c1 int)
on myScheme1(c1)

create clustered columnstore index cs on t1 on myScheme1(c1)

insert into t1
(c1)
values
(1), (10), (100), (600)

With our table here we’ve got a partition function with break points at 1, 100, and 1000. We then insert rows into a columnstore index on that table with the values of 1, 10, 100 and 600. If we try and split our partitions using the ALTER PARTITION FUNCTION command with any value between 1 and 1000 we’ll get an error message.

ALTER PARTITION FUNCTION myRangePF1 ()
SPLIT RANGE (500);

This is because those partitions contain data. If we try and more any of the partition ranges (1, 100, or 1000) then we’ll get an error because we can’t merge them together either.

So how to be work around this issue? Well the easiest way is to delete the data from the table, then fix the partitions, then put the data back. And frankly, that’s the only option. There’s a variety of ways to do that. The easiest option is probably to partition switch out that partition to an un-partitioned table. Then it’s just a matter of moving that data back into the partition in question.

Now if the partition that you’re currently writing to is the last partition, and you need to keep writing to that partition while this is all happening thing are going to get a little more interesting as you’ll need to take an outage in order to do all this work. There’s really no good solution besides taking an outage to move data around and get it correct in order to resolve this issue.

I wish I had a better answer here, but so far there’s no good solution.

Denny

The post Partitioned Tables and ColumnStore appeared first on SQL Server with Mr. Denny.

Share

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?