How to Test your Business Continuity and Disaster Recovery Plan

Home / Policy / How to Test your Business Continuity and Disaster Recovery Plan

Testing is an essential part of your Business Continuity and Disaster Recovery Plan. Until you put your plans through some simulated tests all you have is theory and you can’t be sure that it will work in a real-life scenario. How much time and resources you want to put towards the test will depend on the type of test that is best for your organization. Generally, more theory-based tests can be done multiple times per year and will only involve a small number of employees, usually upper management and other key personnel.  Then you can do the more comprehensive and hands on test which will require widespread participation once or twice a year. It may also be a good idea to bring in an outside consultant, to have some fresh eyes look at your plan and point out some weak areas that you may have missed. Here is an outline of some the tests you can do for your recovery plans, starting with the least intensive options:

Types of BC and DR testing

Table Top Exercise

A tabletop exercise is a discussion based session that usually takes place in a conference room with upper management or executives. The purpose here is to look over the plan, use it in a few different theoretical scenarios, identify any gaps in the plan through brainstorming and ensure all business units that will be needed are represented in the plan. It doesn’t take much resources and can be done routinely without causing a big burden. 

Structured Walk Through

In this type of exercise each team member walks through their individual component of the plan in order to find any gaps. Usually this is done with a specific type of situation in mind e.g. hurricane or earthquake. 

Simulation Testing

For a simulation you gather all of the personnel that will be involved in the response plan and go through a simulation of an emergency and see how well the plan functions in that situation. These should be done at least once per year. 

Parallel Test

In this type of test failover systems are tested to make sure that they can perform real business operations and support key processes and applications in the event of a disaster. Primary systems still carry the full production workload.

Cutover Test

This takes the parallel test further and uses the failover systems to support the full production workload. You completely disconnect the primary systems. This type of test gives you as close as possible to a guarantee that in the event of a disaster, your failover systems will be able to support your entire business.

Levels of BC and DR Testing

When you perform a test, you can test each system to a different level of depth. Here are some of the levels you may want to test your systems for:

Data Verification

This level of testing aims to ensure that files have proper back ups made but doesn’t test that you can recover from those backups.

Database Mounting

Database mounting tests that a database has basic functionality such as being able to read the data.

Single Machine Boot Verification

This tests that a single server can be rebooted after it has gone down. But doesn’t prove that the server will still be functional and productive to the business. 

Runbook Testing

This includes multiple systems that work together to deliver a business service, such as a clustered database. This test serves to prove that a single business service can be restored. 

Recovery Assurance

This is the highest level of DR/BC testing and encompasses multiple machines, application testing, service level agreement (SLA) assessment, and doing diagnostics to explain why any rollback to system recovery failed. 

Disaster Recovery Testing Best Practices

Test Regularly 

23% of businesses never test their disaster recovery plan and about 33% test once or twice per year. Most large companies test their plans quarterly and this is about the standard you should aim for. 

Have Clear Goals

This refers to setting Recovery Point Objectives (RPO) and Recovery Time Objectives(RTO). RPO refers to how much data you are willing to lose before restoring your services and RTO refers to how much time can pass before services are restored, these together measure your ability to restore services on time and before too much data is lost. Additionally, some industry regulations such as health care require you to know and document your RTOs. About 65% of organizations fail their own Disaster Recovery Tests so it’s important to focus on meeting these goals.

Outsource if Necessary

Especially for a small company you may want to use a DRaaS (Disaster Recovery as a service) provider, they offer many services around disaster recovery including ongoing testing and 24/7 monitoring of DR solutions.