One of the largest hosting providers in Europe recently had a major fire in one of their facilities that completely destroyed a data center and forced the whole plant to shut down. The company publicly went out and recommended affected clients to activate their disaster recovery plan – but what if there was no plan?
Disaster recovery planning and management are often overlooked by CTOs as it is indeed unlikely for an incident of that magnitude to occur. However, the recent outage shows yet again how devastating the consequences of not having proper backup and disaster recovery processes in place can be. Risks relating to the matter often have a severe to catastrophic potential impact. Clients that have not taken necessary precautions will face significant downtime and possibly compromised applications that will impact the entire organization. These risks need to be mitigated and can generally be derived from one of three overarching themes – lack of backup, no documented disaster recovery plan, or that the disaster recovery plan is not tested.
Regardless of the hosting setup, any IT organization must take regular backups. The above-mentioned example also stresses the importance of having the backup facility geographically separated from the primary data center, in the case of regional destruction. Redundant application servers are needed where applications could be run in the event of an outage or disruption.
Maintaining a backup infrastructure provides no guarantees if there is no strategy to govern the usage of it. A proper disaster recovery plan should include all necessary procedures to restore the infrastructure from a backup with a prioritization based on Recovery Point Objective (RPO) and Recovery Time Objective (RTO). RPO defines the maximum application data loss that can occur before there is significant harm to the business as the time from the point of disruption to the most recent backup, and RTO refers to the downtime tolerance as the time it takes to resume services after a disruption is notified. The different roles and their respective responsibilities should be outlined in the disaster recovery plan so that the process is not dependent on any key individuals. The plan should be revised regularly and stored in a centralized location where it can easily be accessed by different members of the IT organization.
It is not uncommon that companies have a disaster recovery plan in place that has seldom been activated or tested – which raises the question of whether it is exhaustive enough to restore the IT environment within the required timeframe if a disaster should occur. An untested disaster recovery plan may also lead to unexpected errors causing delays in the failover procedure. Regular disaster recovery exercises with end-to-end testing will shed light on potential flaws and misalignments to the disaster recovery plan that can be mitigated, this will also enable the disaster recovery plan to evolve and adapt to the organization.
BearingPoint offers Technology Review and Due Diligence services that address disaster recovery maturity and suggests mitigating actions to any associated risks. Two-thirds of the recently analyzed target IT environments carry one or several risks that can be derived from the above-mentioned themes and would likely suffer severe downtime should an incident like this occur. Feel free to contact us for inquiries regarding our services.