The MERGE and large data sets

Recently I was doing some work at a customer and they were having some performance problems with their ETL into their data warehouse (see I do know at least some BI). One of the steps which was taking a long time was a MERGE statement which was updating existing rows and inserting new rows.

The target table in this case has several hundred million rows, so it’s a pretty good sized table.

It was a rather simple MERGE statement so we tried changing it from MERGE to separate UPDATE and INSERT commands because I remembered someone complaining at a conference about how performance went down when they started using MERGE.

So we changed the stored procedure and let the new version run the next morning. And boy was it a good decision. We reduced the run times of that step by about 50%. We changed another few MERGE statements which use large tables and got other great improvements, in one case reducing the run time of the job step by 75%. A heck of an improvement.

While the MERGE statement was easier to manage, the goal is to make the system run fast. We did that in this case, very nicely.

So the lesson here is that if you have large data sets that you are working with, MERGE isn’t for you.



One Response

  1. Your assertion that MERGE is slow needs to be backed up with some hard evidence.  What was the reason for this? When is it acceptable to use MERGE? You should also provide an example of the code you were using. Maybe there was a problem with your code? Without more rigourous analysis, there’s not much to be learned here.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?