Splitting Files for SQL DW

When doing data warehouse loads into SQL DW you’ll want to break your single large file that you extract into multiple smaller files in order to do loads into the SQL DW (via Azure Blob Storage) as fast as possible. There’s a few ways that you can do this. One way would be to handle this in your ETL that extracts the data from your source database. Depending on how you are extracting the data this can be anywhere from painful to REALLY, REALLY painful.

A much easier way is to just extract the data to a single text file, then split that single text file using a command line tool. Now you are probably asking yourself where you might get this handy command line tool. Well thankfully you can download it right here. While doing some work for a client on SQL DW we went ahead and put this tool together for you which as a command line tool you can easily build into your ETL process. The only requirements for using the tool is that you have .NET 4.0 installed on the machine which is running the tool. It runs very quickly and will create evenly sized files from your large source file. I used it recently on a ~16.7 Gig file and it was able to process the file in about 20 minutes (keep in mind I was using a slow 7200 RPM consumer hard drive. Smaller 2 Gig files processed in about a minute (again same slow disk). As time permits we’ll work on making the software run even faster.

Denny

Share

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?