SAN snapshots, and I don’t care who your vendor is, by definition depend on the production LUN. We’ll that’s the production data.
That’s it. That’s all I’ve got. If that production LUN fails for some reason, or becomes corrupt (which sort of happens a lot) then the snapshot is also corrupt. And if the snapshot is corrupt, then your backup is corrupt. Then it’s game over.
Second rule of backups: Backups must be moved to another device.
With SAN snapshots the snapshot lives on the same device as the production data. If the production array fails (which happens), or gets decommissioned by accident (it’s happened), or tips over because the raised floor collapsed (it’s happened), or someone pulls the wrong disk from the array (it’s happened), or someone is showing off how good the RAID protection in the array is and pulls the wrong two disks (it’s happened), or two disks in the same RAID set fail at the same time (or close enough to each other than the volume rebuild doesn’t finish between them failing) (yep, that’s happened as well), etc. If any of these happen to you, it’s game over. You’ve just lost the production system, and the backups.
I’ve seen two of those happen in my career. The others that I’ve listed are all things which I’ve heard about happening at sites. Anything can happen. If it can happen it will (see item above about the GOD DAMN RAISED FLOOR collapsing under the array), so we hope for the best, but we plan for the worst.
Third rule of backups: OLTP systems need to able to be restored to any point in time.
See my entire post on SAN vendor’s version of “point in time” vs. the DBAs version of “point in time” .
If I can’t restore the database to whatever point in time I need to, and my SLA with the business says that I need to, then it’s game over.
Fourth rule of backups: Whoever’s butt is on the line when the backups can’t be restored gets to decide how the data is backed up.
If you’re the systems team and you’ve sold management on this great snapshot based backup solution so that the DBAs don’t need to worry about it, guess what conversation I’m having with management? It’s going to be the “I’m no longer responsible for data being restored in the event of a failure” conversation. If you handle the backups and restores, then you are responsible for doing them, and it’s your butt on the line when your process isn’t up the job. I don’t want to hear about it being my database all of a sudden.
Just make sure that you keep in mind that when you can’t restore the database to the correct point in time, it’s probably game over for the company. You just lost a days worth of production data? Awesome. How are you planning on getting that back into the system? This isn’t a file server or the home directory server where everything that was just lost can be easily replaced or rebuild. This is the system of record that is used to repopulate all those other systems, and if you break rules number one and two above you’ve just lost all the companies data. Odds are we just lost all our jobs, as did everyone else at the company. So why don’t we leave the database backups to the database professionals.
Now I don’t care what magic the SAN vendor has told you they have in their array. It isn’t as good as transaction log backups. There’s a reason that we’ve been doing them since long before your SAN vendor was formed, and there’s a reason that we’ll be doing them long after they go out of business.
If you are going to break any of these rules, be prepared for the bad things that happen afterwards. Breaking most of these rules will eventually lead to what I like to call an “RGE”. An RGE is a Resume Generating Event, because when these things happen people get fired (or just laid off, if they are lucky).
So don’t cause an RGE for yourself or anyone else, and use normal SQL backups.
The post Why aren’t SAN snapshots a good backup solution for SQL Server? appeared first on SQL Server with Mr. Denny.