Recently I was doing some work at a customer and they were having some performance problems with their ETL into their data warehouse (see I do know at least some BI). One of the steps which was taking a long time was a MERGE statement which was updating existing rows and inserting new rows.
The target table in this case has several hundred million rows, so it’s a pretty good sized table.
It was a rather simple MERGE statement so we tried changing it from MERGE to separate UPDATE and INSERT commands because I remembered someone complaining at a conference about how performance went down when they started using MERGE.
So we changed the stored procedure and let the new version run the next morning. And boy was it a good decision. We reduced the run times of that step by about 50%. We changed another few MERGE statements which use large tables and got other great improvements, in one case reducing the run time of the job step by 75%. A heck of an improvement.
While the MERGE statement was easier to manage, the goal is to make the system run fast. We did that in this case, very nicely.
So the lesson here is that if you have large data sets that you are working with, MERGE isn’t for you.