Moving huge amounts of data between data centers

Every once and a while I get the fun project of moving a data center from one location to another, often from a Managed Service Provider to either a CoLo facility or to another Managed Service Provider.  When I’m doing this the data sizes typically aren’t small.  In the case of my current project one days worth of compressed database backups are running 581 Gigs which need to be transferred from a Managed Service Provider in Dallas to a CoLo in New Jersey with minimal cost and time spent.  When I’m doing these sorts of massive data migrations there are basically three different ways that these can be done.

  1. Copy the backups to a USB drive and ship it – aka. Sneaker Net
  2. Upload the files across the VPN using RoboCopy
  3. FTP the files up to the new site

While I love using #1 as it’s the easiest and usually the quickest way to move huge amounts of information it’s the most expensive, and it isn’t very repeatable.  If there’s a problem with the data transfer or you need to redo the data transfer then you need to purchase another USB drive from the first provider and have them ship it up to the new site.  As this all needs to be done pretty quickly that means that every time you do it you need to pay for overnight shipping which gets expensive, fast.  Not to mention that either you need to be at the destination site or you have to pay for remote hands to open the box and connect the hard drive.  In this case that means paying a few hundred bucks to have a guy at the remote site unbox the drive and connect it as the data center is in New Jersey and I’m in California a short 5.5 hour plane flight away.

Option #2 that I give here is a decent option as well, except that it only single threaded unless you do some really interesting stuff to spin up multiple copies of robocopy.  The reason that you want multiple threads running is because most Managed Service Providers has some sort of Quality of Service settings configured on their routers so that one connection isn’t able to take all the bandwidth available.  In this case each connection is limited to about 500kb of bandwidth so if I run several threads I get more throughput than if I run just a single thread.

Which leads me to option #3.  As I’m moving lots of database files it’s easy enough for me to do multi-threaded FTP as I can sent each file separately getting getter bandwidth (currently I’m pushing 1,662.90 kb per second).  I do this not with the native command line FTP or with the web browser, but by using a little FTP application which has long been abandoned by the developer called LeechFTP.  While it hasn’t been upgraded in years, and some of the error messages aren’t in English, it’s a great app for moving massive amounts of data in a multi-threaded process.

Now because FTP does totally suck when it comes to resuming from a failed upload process I add an additional step into the process, I take whatever data I’m planning on transferring and use either winrar or 7zip to break the files into smaller chunks.  Typically I’ll just take the entire folder that has all the backups and make one massive upload set out of it.  I usually break the files into 100 Meg segments as those will usually be able to be uploaded without any sort of interruption, and if there is a problem reuploading 100 Megs worth of data usually won’t take all that long.  Now I don’t bother to compress the data, I just put it into a single large rar or 7z fileset.  The reason that I don’t bother trying to compress the data is that it’ll take hours to compress and the time saved usually is pretty small if any (especially as these backups are already compressed).  Both winrar and 7z have store only options which usually run pretty quickly.  The example 581 Gigs of data that I’m working with here was able to be stored by 7z in about an hour and a half.

I’ve begun using 7z instead of winrar for these projects as I’ve found something very annoying about winrar when using it to copy files up to a network share (like say the shared network drive that the SQL Server is going to restore from).  When winrar decompressed all the data it wants to put it into the local %temp% folder which ends up filling the C drive of what ever server you are doing the work on, while 7z doesn’t have this annoying “feature”.

Once the full backups are copied up (~4 days in this case) I just unpack them and get the restore process started (keep in mind that I’ve got log backups being copied across the same network link as well.  I’ll take about how I get them across later on.



Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?