Mirroring isn’t a backup solution

In case you live under a rock and haven’t heard about Journalspace.com’s little mistake, they have gone out of business due to a database problem.  Here’s a screenshot in case the site is down when you look at it.

In a nutshell it appears that they were relying on a RAID1 array as the database backup.  While we see this all the time in small database shops as noted on /. this site has been up since 2002 and had an Alexa page rank of 106,881 with 14k monthly visitors (according to Quantcast).  For a site so large to be making such a simple mistake is just unacceptable.

I can guarantee you that if I see a DBA resume listing journalspace.com as a prevous place on employment I’ll have to think more than twice about bringing them in.

As it says on the website the business is closed, they are going to be selling the domain name and any trademarks off to the highest bidder.

As DBAs we need to remember that the companies that we work for (or consult for) live and die by our mistakes.  If for example I were to do this same thing and we were to suffor the same fate at Awareness Technologies (my employer) I would be responsible for about 80 people being out of work.

This reminds me of a similar situation that happened when I worked for GameSpy.com.  The development team insisted on having full rights to all the production databases so that they could troubleshoot issues without having to wait for me to give them rights.

We were in the middle of deploying our new backup solution because the file server that we used to backup to couldn’t hold all the database backups any more.  Everyone had been informed that some databases weren’t being backed up while we got the new system up and running and that extreme care needed to be taken while working in the databases.

A few days later a developer comes to me asking me to restore the database with all the news articles in it back into production.  I inform him that we don’t have a backup of that database as it is one of the ones that we weren’t backing up while going through the transition.  I then ask why he needed it restored.

Apparently he thought that he was connected to the development database and truncated the table with all the articles that the site authors had written for the last several years.  Needless to day, the developers lost their access to the production databases that day.  It took the editors two or three days to get the current stories back into the system from the Microsoft Word copies that they used to do the initial story writting.

Fortunately in my story we were able to recover from the problem without much loss in revenue.  The folks at journalspace.com were not so lucky.  Not only will what ever employees they had be out of work, but all the work of the countless bloggers who blogged on that site have lost everything they wrote, some of them I’m sure for years.

This also points out the massive amount of trust that we as bloggers must have in the companies which host our blogs on our behalf.  We can only hope that they backup the databases which hold our blogs regularly so that none of what we write is lost.

I won’t begin to speculate what happened to the database over at journalspace.com, they do a pretty good job of that within the message saying that the site is gone.  What I will do is say that I agree, there was most probably a human element involved which brings up database security and SQL Injection protection.  If there was a human involved (again there probably was) I would assume that additional database security and protection against SQL Injection attacks would have prevented this from happening; although I guess if you aren’t going to backup the database having everyone with right access to the database isn’t that much of a leap.  I would be curious to go through the web server logs from the time of the data loss to see if it was an injection attack or an employee logging into the database directly.

A few hours (or possibly minutes) worth of changes to the database configuration could have kept this very popular site up for many more years to come.  It is a shame that they won’t be with us any more, they will be missed around the intertubes.

This was also talked about by Brent Ozar umong others.



4 Responses

  1. Looks to me from their twitter they were in the middle of an upgrade and it went bad… very bad.

    Always break a mirror before a major upgrade– especially if you don’t have backups.

  2. I was going through the twitter, and it looks like upgrade was from October 2007. The twitter account went dark for over two years then came back up with the problems a few days ago.

    From what I’m seeing performance tanked, new hard drives were ordered and I would assume they tried to old remove one smaller/slower drive and insert new larger/faster drive and let the array rebuild which should have gone fine, unless the remaining drive had a hiccup, but that should have broken the array not wiped the file.

    There isn’t much info in the twitter. More information would be nice so we can all analyze what happened.

  3. Ahhhh, I missed that in the year. I wondered why the months were all jumbled up.

    At any rate, you’d think they would have SOME sort of backup no matter how old.

  4. I’ve posted an [A href=”http://itknowledgeexchange.techtarget.com/sql-server/journalspacecom-says-the-site-was-trashed-by-the-it-guy/”]update[/A] on the [A href=”http://www.journalspace.com/blog/”]JournalSpace.com[/A] shutdown.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trust DCAC with your data

Your data systems may be treading water today, but are they prepared for the next phase of your business growth?