Approximate COUNT DISTINCT

Published On: 2019-01-03By:

We all have written queries that use COUNT DISTINCT to get the unique number of non-NULL values from a table. This process can generate a noticeable performance hit especially for larger tables with millions of rows. Many times, there is no way around this. To help mitigate this overhead SQL Server 2019 introduces us to approximating the distinct count with the new APPROX_COUNT_DISTINCT function. The function approximates the count within a 2% precision to the actual answer at a fraction of the time.

Let’s see this in action.

In this example, I am using the AdventureworksDW2016CTP3 sample database which you can download here

SET STATISTICS IO ON
SELECT COUNT(DISTINCT([SalesOrderNumber])) as DISTINCTCOUNT
FROM [dbo].[FactResellerSalesXL_PageCompressed]

SQL Server Execution Times:  CPU time = 3828 ms,  elapsed time = 14281 ms.

SELECT APPROX_COUNT_DISTINCT ( [SalesOrderNumber]) as APPROX_DISTINCTCOUNT
FROM [dbo].[FactResellerSalesXL_PageCompressed]

SQL Server Execution Times: CPU time = 7390 ms,  elapsed time = 4071 ms.

You can see the elapsed time is significantly lower! Great improvement using this new function.

The first time I did this, I did it wrong. A silly typo with a major result difference. So take a moment and learn from my mistake.

Note that I use COUNT(DISTINCT(SalesOrderNumber) ) not DISTINCT COUNT (SalesOrderNumber ). This makes all the difference. If you do it wrong the numbers will be way off as you can see from the below result set.  You’ll also find that the APPROX_DISTINCTCOUNT will return much slower then the Distinct Count which is not expected. 

Remember COUNT(DISTINCT expression) evaluates the expression for each row in a group, and returns the number of unique, non-null values, which is what APPROX_COUNT_DISTINCT does. DISTINCT COUNT (expression) just returns a row count of the expression, there is nothing DISTINCT about it. 

Always fun tinkering with something new!


Video

Globally Recognized Expertise

As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.

And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.