I was working on performance tuning the data deletion process that I use to remove data from our database. We delete massive amounts of data on a daily basis, so our data deletion process is pretty important to us. As we’ve grown our customer base the amount of data to be deleted every day has also grown, and the amount of time to delete that data has gone up as well.
We it has started to take stupid amounts of time to delete the data, so I started digging into the data deletion procedures.
The first thing I looked at was the actual delete statements. These seams ok, the actual deletes were happening very quickly (we process deletes in batches of 1000 records per batch to minimize locking of data). So next I looked at the part of the code where we select the records to be deleted. Looking at the execution plan, everything looked ok.
But this little chunk of code took about 50 minutes to run. Pretty bad when only returning 1000 numbers back from the database.
[sql]SELECT TOP (@BatchSize) a.PolicyIncidentId
FROM PolicyIncident a WITH (NOLOCK)
JOIN #ComputersToProcess ComputersToProcess ON a.ComputerId = ComputersToProcess.ComputerId
WHERE CaptureTimestamp < ComputersToProcess.StartDeleteAt
The first thing that I did was put a primary key on the @ComputersToProcess table variable. That turned the table scan into a Clustered Index Scan, but didn’t do anything for performance.
The next thing I did was switch the table variable to a temp table (without a primary key). This really didn’t do anything to speed up the process as there is still no statistics on the data. However this time the execution plan actually shows you that there’s no statistic on the temp table.
Now, I didn’t want to put at non-clustered index on the table keeping the table as a heap, and a clustered index that wasn’t a primary key wasn’t going be any more effective than a primary key, so I put a primary key on the table. While the query cost percentage went up from 2% to 7% the actual run time went down from 50 minutes to just 1 second.
Now I didn’t make any other code changes to the procedures, just changing from the table variable to the temp table, and adding a primary key and this one little three line query went from an hour to a second. Its amazing how much such a small change can make things run smoother.
Now obviously this isn’t going to fix every problem. But in my case I’m putting a little over 190k rows into the table variable (now temp table) and this is just to much for the table variable to take. Keep in mind that with table variables the SQL Server has statistics, but it assumes only a single row per temp table, no matter how much data is actually in the table variable.