I recently read this post regarding backups on slaw.ca, which is a legal blog site. An individual took the initiative to research and purchase some backup software only to be disappointed at the effort it took to configure and get running correctly. This got me thinking about my experience over the years with backup and restore procedures, being in emergencies having to rely on backups, and most importantly the expectations of clients.
Many small and medium sized businesses think that as long as they have a backup they can restore a system easily. While it is true that having good backup procedures can give a business confidence that it will have all its data should a failure occur. The idea that in a disaster, one can be back up and running quickly because these backup procedures in place is typically not the case. Backup is different from disaster recovery.
I recently assisted a law firm that exemplified the problems and perceptions associated with backups. The client, a law firm had a server that was about 5 years old. It started having memory errors and crashing suddenly. The obvious thing to do was to replace the bad memory, However, attempting to do so was not trivial. The memory type the server used was no longer available. Attempting to locate memory that was compatible with the mother board was unsuccessful. It was decided to order a new server. The server took 2 weeks to arrive. Of course once it did arrive, it is not a simple restore the backup and go. The operating system has to be installed and configured. In this particular case the operating system was Windows 2003 server. Once the O/S is installed, you typically have to re-install the applications. The reasons for installing the applications are that you have to install the backup software so you can actually run the restore process. Very few applications will just ‘restore’ to a disk, especially in a Windows environment. In Windows environments, applications typically make use of a central repository called the registry. The registry contains everything from configuration settings of applications, locations of files, user configuration settings and the list goes on. Although you can backup and restore a registry and most decent backup software does this, the registry is reasonably dynamic entity. As such, it is rarely if ever a clean restore and more often than not, it is quicker and easier to re-install the applications which re-create the registry entries then to do a restore of the registry and subsequent debugging. In this particular case, since it is a new server, there were different drivers that needed to be installed, which affected both the files on the backup as well as the entries in the registry. Finally we had to restore the data files, databases and test. The whole process took an entire weekend.
I have consulted for companies that take images of the hard drives in an attempt to solve this problem. They typically do this at specific intervals, once every few weeks or few months. Should a system fail, the idea is they can re-image the drives which restores the operating system, registy, and applications the configurations. Next, they restore the latest backup on top and everything will be back to normal. Although you can image a live drive, to ensure the integrity of an image you should take the drive offline, or at least put it in a read-only mode during the imaging process. This is typically not feasible in today’s environments. System need to be constantly up and on-line. Even if you have a mechanism to successfully image (vmware has a process to do this), chances are drivers will change and be different if you have to replace a server or motherboard, or a drive size will be different affecting the ease with which the drive image can be restored.
The only effective implementation of a disaster recovery solution is one by large institutions. At one financial instituion for example, they had duplicate systems, memory, CPU, and drives all stored in case their was a failure. Everything was imaged as well as data being backed up to a central backup system. The ability to restore a system was tested and documented at regular intervals. This type of redundancy and rigor costs money and time, both things that are difficult to come by for most small businesses.
Backups are not the same as disaster recovery. In times of systems failure or worse, businesses need to be very cognisant of the amount of time it can take to fully recover and get systems, applications, and functions back to normal. In the event of a disaster it is important that business understand the true amount of time they will likely be down and plan accordingly.
