Many companies use co-located data centers to store their hardware. In some cases (like the colo we at DCAC use) you pay for power, cooling, and a connection to the internet. There is no expectation of added services other than those three things. In other cases, companies like Rackspace or Level 3 are what are known as managed service providers (note: I’m not talking about Rackspace or Level 3 in this post, I’m not going to name the guilty party, but if you want to know, you can reach me privately). Managed service providers offer solutions like shared storage, network management, and other value added services beyond just a space for your servers.
Enter the Cloud
So Microsoft and Amazon are effectively playing in this space with their IaaS offerings. There’s a big difference, however, as the cloud providers have invested a great deal of money in automation. The same customer I’m talking about in today’s post has some Azure VMs that we are deploying. I built and VM and allocated 3 TB of SSD storage in about 10 minutes this morning. Pretty slick operation–I’m fairly certain when I ran the PowerShell to deploy the VM, there was no person who got up to do anything. When I added the storage, I’m pretty sure no SAN zoning took place, and if it did, it was a few lines of code. We had previously stayed away from Azure because it’s not the most cost effective solution for very large workloads (Colo’s tend to have slightly better pricing on big boxes, but you get nickled and dimed on other things.
When Your Colo Sucks
So I have two different work streams going with the colo right now. One of which is to configure a site-to-site VPN to Azure. This should be a simple operation, however it took over a week to get in place, and only after I sent the colo the Cisco instructions on how to configure the VPN were they able to tell me that the Cisco device they had didn’t support the latest route-based VPN in Azure. So we finally get up and running, and then we discover that we can’t get the Azure VMs from certain on-prem subnets. We ask them to make a change to add those subnets and they completely break our connection. Awesome.
The other workstream is a cluster upgrade. I wanted a new cluster node and storage, so we didn’t have to do an in-place upgrade. We started this process like 3 weeks ago, hoping to do the migration on black Friday. We had a call today to review the configuration. Turns out they had nothing in place, and aren’t even sure they can get a server deployed by NEXT FRIDAY (YES–10 days to deploy a server, your job is deploy servers). I heard lots of excuses like, we aren’t working Thurs/Fri, and we have to connect to two different SANs, we might not have that fibre in stock. It wasn’t my place to yell WHAT THE EVERLOVING $%^^ on the call, so I started live tweeting. Because that’s ridiculous. Managing and deploying infrastructure was what I did for a living, and I wouldn’t have a job if it took 10 days to deploy a server, and that wasn’t my only job. That really is the colo’s only job. How the #$%^ do you not have fibre in stock? Seriously? My lab at Comcast had all the fibre I could possibly need.
Edited to add this:
This is after last month when they confused SAN snapshots with SAN clones (when it takes 4 hours to recover from a “snapshot” it’s a clone) and presented production cluster storage (that was in use) to a new node. Awesome!!!
Why the Cloud will Ultimately Win
Basically, when it comes to repetitive tasks like deploying OSs and setting up storage, software is way better than humans. Yeah, you need smart engineers and good design, but Azure and AWS are already 90% of the way there. Also, there service levels and response times are much better, because everything is standardized and makes troubleshooting and automating much easier.