People often get HA (High Availability) and DR (Disaster Recovery) mixed up. There are a couple of reasons that this happens. First is the fact that there aren’t any clear guidelines which separate the two. There are standard terms which are used to help define HA and DR, but there’s nothing which says this is how you build an HA environment and this is how you build a DR environment. The reason for this is that every system out there is different, and every system out there has a different set of requirements.
All to often when I talk to people about HA and DR they pick the technology that they want to use before they finish defining the requirements of the HA or DR platform. This presents a problem because after they the technology is picked the system is pigeon holed into the solution which has been selected, often without fully defining the RTO and RPO.
The RTO and RPO are the Recovery Time Objective and the Recovery Point Objective. The RTO is defined as the amount of time that it takes to get the system back online after a critical system failure. The RPO is defined as the amount of data which can be lost while bringing the system back online after a critical system failure. Neither of these are numbers which can be defined by anyone in the IT department. Both of these numbers are numbers which need to be defined by the business unit. If the numbers aren’t defined by the business unit that owns the system, the numbers basically don’t mean anything. The reason that I say that it won’t mean anything if IT defines these numbers is because IT probably doesn’t have a good understanding on the monetary losses of 10 minutes of data loss, of if the system is down for 24 hours while the system is being brought online.
Different situations require different solutions. Not every solution in a single shop needs to be the same. Your most important systems should have one kind of solution, which has a low RTO and RPO while the less important systems have higher RTO, and possibly a higher RPO. The companies that try to build a single HA solution (and/or a single DR solution) are the companies who are destined to have their HA, and specifically their DR solutions fail, usually in a fantastic blaze of glory.
When looking at your HA and DR solutions don’t look just within SQL Server. There are a variety of other technology solutions which can be used when designing HA and specifically DR solutions. This is especially true in the storage space when it comes to data replication specifically. Look at your vendor solutions as well as solutions from third party providers. While there aren’t many third party data replication solutions there are some out there that can be leveraged such as EMC’s Recover Point appliances. But like all the solutions which are available these aren’t and end all be all solution either. They like all the options which are available should be used where they make sense and not everywhere just because they were purchased.
Microsoft’s new feature in SQL Server “Denali” called “Always On” (aka. HADR, HADRON) while marketed as a total HA/DR solution for SQL Server databases is a pretty good looking solution. However I don’t think that it’ll be the end all be all solution. It’s going to have limitations that have to be worked around, just like any software based solution is going to. It will however make for a powerful tool in the toolbox which is available to us as IT administrators.
When it comes time to design your companies HA and DR strategy don’t get locked into thinking about one specific technology. Look at all your options that are available to you, and learn about them so that you can make the decision to implement the correct solution for the specific task at hand. For one system you might use database mirroring for your HA platform and Log shipping for your DR platform. For another you might you clustering for your HA platform and mirroring for your DR platform. For another you might use Clustering for both your HA and DR platforms. It really and truly all depends on the needs of system which you are designing for.
The big argument that I hear from companies that have a single HA solution and a single DR solution is that because there is only one solution it is much easier to train staff on how to manage that one platform. And that is certainly true, teaching someone one thing is much easier than teaching them 5 things. However your IT staff isn’t a group of monkeys working from memory when working on these systems (if they are please send pictures). When DR tests and done and when systems are failed over to DR specific run books are used to ensure that everything comes online correctly and in the correct order. So if everything is going to be laid out in a run book anyway, why not have a few different technologies in use when it makes sense.
The whole point of this I suppose is don’t get locked into a single solution. Use the right tool for the right job, even if it takes a little more time, or a little more money to get setup. In the long run using the right tool for the right job will make keeping your database applications online a much easier process.