Building a new CoLo – Part 1 of n

So if you follow me on twitter you might have seen this tweet a little while back.

Since there aren’t many people out there that get the chance to buy and build a brand new data center from scratch, I figured that I’d go over the process with you.  This is the first of who knows how many blog posts on the topic.

The first step in buying colo space and moving into it involves getting completely fed up with your current hosting company.  Currently we are with a large managed hosting provider named RackSpace I probably shouldn’t name them, and have become totally fed up with them.  The costs are to high and we get almost nothing from their support team but grief.  They have actually unplugged a firewall’s power cable in the middle of the day by accident.  We actually have to have paper signs taped to the racks with the equipment which says to not touch anything in these racks between 6am eastern and midnight eastern without manager approval (or something to that effect) because it has happened so many times.

The first step to moving into your own CoLo (this process has taken about a year at this point) is to figure out how much processing power and storage you need to purchase.  This doesn’t need to be an exact figure, but a rough estimate.  This will eliminate some hardware options for you.

You also need to know what features you are looking for.  Here are some questions that can help you figure these things out.

  • Are you going to virtualizing servers?
    • A few large VM hosts?
    • Lots of little VM hosts?
  • Will you need storage level replication to another data center later on for DR?
  • If you will be virtualizing servers, will you need to be able to setup a Windows cluster as a VM?
  • How long do you need to keep backups for?
    • Onsite?
    • Offsite?
  • How much data growth is expected?
    • Over one year?
    • Over two years?
    • Over three years?
  • How IO rates need to be supported?
  • How much IO throughput needs to be supported?

So lets break these questions down a little bit.

Are you going to virtualizing servers?

This one is pretty much a give in.  Most every company should be virtualizing at least some of their servers.  If nothing else things like domain controllers, and other infrastructure servers should be virtualized.  It just doesn’t pay to have physical servers sitting around using 1% of the CPU all day.  Other servers like web servers and app servers are also usually a no brainer when it comes to virtualizing them.  The big questions come down to your mission critical servers, SQL Server, Oracle, Exchange (yeah I know, it’s not mission critical but just wait for Exchange or mail to go down them tell me it isn’t mission critical), SAP, etc.  These machines may or may not be able to be virtualized.

It’s OK to have some machines by virtual and others to be physical.  In the case of this project everything is virtual except for the SQL Server cluster (to large to be a VM) the vCenter management server (cause I’m old school and want it physical), monitoring (it’ll run on the vCenter server for the most part), and some appliances which are physical appliances which have to be racked.  All the web, file, and infrastructure servers will be VMs.

In our case we are going with a few larger hosts instead of a bunch of smaller hosts.  As we got through the hardware review process we landed on Cisco UCS blades and servers.  For the VMware hosts we are running on several of the dual socket, 8 core per socket blades with something like 96 or 128 Gigs (might be even more at this point) of RAM per blade.

For the SQL Server cluster we are also using blades as they ended up being less expensive than their physical counter parts.  The SQL Server blades are quad socket, 8 core per socket blades with 256 Gigs of RAM per blade.  We didn’t pick these blades for the VMware hosts because it was actually cheaper to have the dual socket blades over the quad socket blades, and nothing that will be a VM will be getting more than 4 or 6 vCPUs so having the smaller blades isn’t an issue.

Will you need storage level replication to another data center later on for DR?

If you are planning on building a DR site at some point in the future this is important to know now.  It would really suck to buy a storage solution that doesn’t support this when you will need it in the future.  Just because you will need it doesn’t mean you need to buy the replication software now, or setup the second DR site now.  But you need to plan ahead correctly for the project to ensure that everything that you want to do with the hardware is supported.  Nothing sucks more than having to go to management in the middle of the DR build and tell them that all that storage that you’ve purchased will be useless and needs to be replaced, not only at the DR site but also at the primary site.  Issues like this can delay DR build out projects for months or years as you now have to pause the DR build out (probably while still paying for the DR site and equiptment), buy and install new storage, migrate to that storage, then restart the DR project and start up the replication.

In the case of this project management said that yes we will want to spin up a DR site probably within a couple of years so this limited our search for equipment to storage platforms which fully supported storage level replication.  This includes having consistency groups so that sets of LUNs are kept in sync together (kind of important for databases, Exchange, etc), integration with Windows VSS provider, supporting of snapshots, etc.

Now if your storage doesn’t support replication, or you want to have a nice expensive storage array at the primary site and a much less expensive storage solution at the DR site, you can look into EMC’s Recover Point appliance.  It supports replication between two storage array’s and doesn’t even require that they be the same brand of array.  It isn’t a cheap solution, but if you’ve got a million dollar solution in one site and a $100k solution in another site Recovery Point might be a good fit.

If you will be virtualizing servers, will you need to be able to setup a Windows cluster as a VM?

The reason that this question needs to be asked is to ensure that the storage array supports iSCSI.  The only way to build a Windows cluster as a VM is to use iSCSI to attach the VMs to the storage directly.  Most every storage array supports iSCSI these days, but there are some that don’t so this is important to know.

How long do you need to keep backups for?

As much as we all hate dealing with backups, backups are extremely important.  And keeping backups for a period of time will save you some headaches in the event that a backup becomes corrupt.  Also there might be regulations on how long backups are kept around for.  Your SOX auditor might have a requirement, as might you HIPAA auditor and your PCI auditor.  You just never know what these guys might through at you.

Then there’s the question of off site backups.  Having backups is great, but you need to get those backups off site in case something happens to the building that the backup system is in.  You’ve got a couple of different options here.

  1. Go old school and have iron mountain or someone pull the tapes and store them somewhere.
  2. Get a virtual tape library (VTL) and backup to that.  Then get a second VTL and put it in an office or another CoLo and replicate between the two.
  3. Put your backups on a LUN and replicate that LUN to another facility
  4. Out source the backups to the CoLo

Option 1 is the way that it’s always been done.  It’s reliable, slow and can be pretty costly.  Option 2 is a pretty new concept, probably just a few years old now.  It can work, if your backups are small enough and if you’ve got enough bandwidth.  Storing a monthly worth of backups can take a LOT of space.  Option 3 probably isn’t the greatest unless the only backups to worry about are the SQL Server backups as SQL can handle the purging of backups it self.  Option 4 is worth looking at.  Depending on the amount of space needed and what your CoLo charges it might be worth it to have the CoLo handle this for you.

In the case of this project we went with a combination of options #1 and #2.  We have a VLT to backup to so that the backups run very fast (a VTL is basically just a separate storage array that is only used by the tape backup software and includes compression and deduplication to reduce the size of the backups).  So we will backup to the VTL then copy the backups to tape.  Then iron mountain will take the tapes off site for us.  The VTL will hold about 2 weeks worth of backups on site, which we’ll have a second copy of on tape.  Once we have the DR site we’ll get another VTL and replicate that, probably increasing it’s storage to 4-6 weeks and dump the need for the tape and offsite backups as everything will be backed up in two different CoLo’s in two different cities.

How much data growth is expected?

Knowing how much space you need today is important.  Knowing how much space you need in 3 years is more important.  Just because a storage array supports your data size today doesn’t mean that it will support it in 3 years.  We use three years for a couple of reasons.  First that’s typically how long the maintenance contract on the hardware is.  Second that’s typically how long the financing term is for these kinds of purchases.

If you have 20 TB of space needed today, but in 3 years you’ll need 80 TB of space that’ll drastically change the kind of equipment that you can purchase.

How IO rates need to be supported?

How much IO throughput needs to be supported?

The next two questions go right along with the prior one.  How much IO needs to be supported and high much throughput needs to be supported.  These numbers will tell you if you need an array which supports flash drives, and how many drives need to be supported.  Without these metrics you are totally shooting in the dark about what you actually need.

Once you’ve gotten all these questions answered you’d think that it’s time to start looking at hardware, and you’d be wrong.  It’s time to go to management and get this thing approved to move forward.  Join me next time as we look at that process.


P.S. This series will be at least half a dozen posts long.  I’ll be tagging all of them with the tag “Building a new CoLo” to make it easier to follow just these posts via RSS if you aren’t interested in the rest of my stuff.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?