We were working with a client recently who wanted us to consolidate several existing standalone servers into a virtualized Power Systems environment. The existing Power Servers all had different backup media and differing backup strategies. As we were working through their needs, I had a conversation with a consultant that we work with regarding a different mutual client. This got me to thinking of some of the things I’ve seen over the years, good and bad.
First, back to the mutual client, they had a system hardware failure (on a very old System i) several months ago that this consultant resurrected for them. Remember, the key here is ‘several months ago.’ Within the last few weeks they experienced another catastrophic hardware failure. In the recovery process, said consultant asked the age of the most recent backup. Their reply, “The last time we backed up anything was after you rebuilt our system several months ago.” Not a good answer.
We have a large number of clients with an equally large number of backup strategies. The key is, do they work and are they executed properly? In most cases, yes and those aren’t the ones we talk about years later. We have clients that backup to virtual tape libraries and then to tape so everything is duplicated, on-site and off-site. We have clients that replicate their daily backups to a hosted off-site virtual tape library. Others have either hot or warm DR sites with recent backups replicated to them and others with full High Availability scenarios ready to go in a moments notice. Many of these even run annual recovery tests – a novel concept. The interesting aspect here, at least to me, is the number of enterprises that never test their recovery plans.
On the other end of the spectrum are the cases that we do talk about years later. There’s the client that called to tell us that they needed service on their tape drive. The problem, they used a simple two tape rotation (insufficient under any circumstances) and they had been rotating the same two tape cartridges for many, many years. Finally, the plastic film of one of the tapes just self-destructed (due to age) as it was fed into the unit. The unit stopped working once it filled itself with thin plastic chips. A simple vacuum cleaner and new tapes fixed that problem. Their saving grace – it happened during a save and not a restore.
Then there was the client that had a server that took a large electrical jolt. Once the service engineer was done, it was time to recover the system. The newest backup was over ten years old. That included not only the OS and enterprise applications, but also the data! The OS and the packaged business applications had both been updated several times since the last backup. The OS was not a major hurdle but one that didn’t need to be jumped with a good system backup plan. The business applications were another matter. The software vendor was able to go back into their archives and create a multi-step (and multi day) recovery process to get them current code wise. The data * shakes head * that was a total manual reload. They were off-line for weeks – and the guy kept his job??
I have several other similar stories but I’d be interested I hearing some of your experiences. If you have a tale of woe you’d like to share, I’d love to hear it.
Michael Miller, President, Arbor Solutions, Inc.