What Happened?A couple of hours later the fire department gave the OK to restore the power to the room without the fire. The room that was on fire had the water fire sprinklers go off. If you aren’t familiar with the process that happens in a well-designed data center (which this is) when there is a fire, there’s a chemical called FM 200 which is released into the room. The FM 200 basically eats the oxygen in the room, and it is supposed to remove the oxygen from the room, which puts the fire out; if the fire is a basic fire (think cardboard, wood, etc.) the FM 200 will put out the fire as without oxygen things can’t burn since fire requires oxygen. However, FM 200 isn’t perfect, and in some cases, including this case, water has to be dropped from the ceiling from traditional ceiling-mounted fire sprinklers. Now as we all know water and computers don’t mix. Typically when the fire sprinklers go off the top one or two devices in the rack are going to be destroyed by the water. When the fire department got on site, they had the power to the room cut (which is standard procedure for a data center, this is why there’s a big button for them to hit, which turns off the power) BEFORE they started spraying water on the fire. Obviously, the equipment in the rack that was on fire would have gotten soaked with water. The racks near that rack would have also been hit with large amounts of water, and the racks near the rack on fire would have been sprayed with overspray, splashback, etc. It’s water, and it goes everywhere. Hopefully, every company hosted in the data center that had the fire has a disaster recovery plan. I say this because that data center will be down for days, possibly weeks. The reason for this is going to be because of the fire inspection that needs to happen, and then after that, there’s the water cleanup that the facility has to do before they can even turn on the power (and the various inspections which have to happen). For a small company this could kill the company, and all of this because of a fire that wasn’t related to your equipment at all.
The question now is, how do I avoid this problem?
If the company is hosting its servers on-premises or in a colo facility, then a disaster recovery plan is going to be essential as that’s the only way to recover services quickly. If the company is hosting the servers in Microsoft Azure, then I’d still recommend a disaster recovery plan. If you are hosted in a region with Azure Zones, then Disaster Recovery is still important but slightly less critical as you can configure all of your services across zones, which put servers in two or three different buildings, usually on different campuses. While Microsoft won’t talk about the distance between the zones, they do state that they are sufficiently separated that a localized outage or disaster will not impact other zones in the region.
The Long and the Short of it
The long and the short of it is that you need a disaster recovery plan for your company. If your disaster recovery plan is just to order new servers and start rebuilding, the lead time on servers could be weeks. In addition that takes cash, and insurance could take weeks or months to pay out for the servers being destroyed. And this isn’t going to account for the staff time that is needed to rebuild the services and restore everything.
Given that it could take days or weeks to bring these systems back online if working with a small or even medium-sized company this could kill the company. Can your company survive without your IT systems for days or weeks? Because the reality is that the companies in the data center room which had the fire are going to have to deal with the situation.
What would happen to DCAC if we were in the Colo that had the fire in it? In short, nothing. Our colo has no customer-facing systems; it is all in Azure. Everything there is there so that we can test solutions for our customers. In short, we have nothing there which is production except for a couple of domain controllers and the AD Sync software. We have domain controllers in Azure, so if the colo were to go away for a month, it would be annoying but not much of an issue.If you need help building your disaster recovery plan and/or moving your systems to Azure (along with a disaster recovery plan), contact the DCAC team, and we can start the project for you sooner rather than later and get your company into a protected state as soon as possible. Denny