Disaster Recovery & Business Continuity – Did You Validate?

Today’s business world shudders at the word. With the current random occurrences of hurricanes, super storms, earthquakes, blizzards, fires, flooding and sustained power outages, it is a wonder how companies can recover to survive all of the natural or modern threats to their daily business operations and collection of business data. Today’s business world is dependent on the data created and stored for many aspects of daily operations: financial data; product specifications; designs; testing records; and quality management systems (QMS) data are only a few examples.

Every business produces, consumes, utilizes, and reports data; the loss of access, complete loss, or corruption of data records can cost company revenue or even terminate a company’s existence. The current national estimated costs for data loss and recovery, as reported from the Data Restoration industry and the U.S. National Archives, are at between 18 – 26 billion dollars annually.

Disaster Recovery / Business Continuity

The cost alone should cause companies to plan for disasters but just what is disaster recovery? Disaster recovery is the act of restoring a system’s outage to full normal operation. It includes the identification of the incident, quarantine of the affected systems, risk review of potential losses vs. last known good status, and the restoration of the system to normal operations. Business continuity, on the other hand, is the overriding workarounds, procedures, processes, and planned actions necessary to maintain the business operations and flows during the outage, the recovery actions, and beyond.

It should be noted that as part a validated system, the procedures for the change control of the system, as well as a verified formal backup and recovery process, are a critical part of the overall implementation.

The means controlling and rolling back any changes are a regulatory requirement to prevent a system and its data from becoming non-compliant or suspect. Periodic reviews of a system’s operation and usage are critical to monitor the “validated state” and maintain a system in a “known good” configuration.

Disaster recovery (DRP) and business continuity plans (BCP) are key processes to ensure that the risks of an outage are understood and the criticality of any data loss and recovery operation is managed, allowing for continued business operations.

Data loss, access to data, and restoration of data systems are critical components of the company’s business processes. Whether it’s a small single user incident or a companywide outage, the loss of data can cripple projects, processes, or critical operations. Planning for outages is vital and may be accomplished by the creation of business continuity plans, disaster recovery plans and with the establishment of “known good” (tested) backup and recovery procedures.

At the key point of a system’s restoration to functional use, the company must assess the risks and plans required for recollection of any lost data, as well as the merging of the data collected by the workarounds and downtime processes during the outage. This is accomplished through the BCP, DRP, and associated backup and recovery processes.

Business Continuity Plans (BCP)

The BCP is an overall governing document to define the main strategies and critical operations needed to continue the business operations when an outage occurs.

  • The BCP maintains critical operations during outage events.
  • Defines the minimum critical requirements for operating conditions during times of unexpected outage.
  • Designed to mitigate potential loss and serve to minimize disruptions and financial impact during even minor events.
  • Defines the company’s strategies and processes to follow during outages.
  • Defines the critical timing for restoration of critical systems to support the business. (Maximum allowed downtime for a system before it will cripple operations).

Disaster Recovery (DR) /Disaster Recovery Plans (DRP)

These are Information Technology focused plans designed to restore the operability of the system, applications, or computer facilities, at an alternate site after a major and usually catastrophic event.

1. Defines safety actions required to protect company assets, data, equipment, and personnel.

2. Concentrates on deploying technology, equipment used, writing scripts, and doing periodic tests or drills to practice recovery of systems after a disaster.

3. Defines the backup and recovery methods or processes used including but not limited to:

  • Offsite Recovery (Rebuilding a lost location)
  • Lost Server Room Recovery (Restoring a data center)
  • Lost Business Critical Operation
  • DR is necessary required action, but is not sufficient by itself. The BCP will define the critical risks and actions necessary to support continued business operations during and after the outage.

Backup and Recovery Processes

These are the Information Technology procedures and work instructions for performing the maintenance actions focused on the protection of a company’s intellectual data property and critical data records. They detail the operation of the company’s backup and restoration equipment, data storage, media rotation, offsite management, and backup image testing.

These processes are used in conjunction with the Disaster Recovery plan to restore a system to the last known good image.

Regulatory requirements define the backup process as critical for ensuring that changes to a validation system can be rolled back

Data losses may still exist upon recovery operations and should be reviewed to determine any actions required as a part of the overall restoration process.

Incident Management

Prior to a systems outage occurrence companies should perform a risk analysis of their critical computer systems and associated processes as a part of the business continuity plan. Key areas to review and ensure action on are:

What computer systems are vital to day-to-day operations?

Identify and document Business Systems – systems needed (i.e. accounting, QMS, CAPA, Safety, Production Testing, etc… – are manual backup processes defined, existing procedures documented)

What are the Business Vulnerabilities?

  • Hardware / Software
  • Information / Data
  • Systems and Processes
  • Buildings / People
  • Partnerships / Suppliers
  • Other

Determine if there have been past interruptions

  • What is the potential for future disruptions?
  • What is the cost of that disruption if it is known?
  • Need to consider factors unique to local environment/community.
  • Evaluate and rate the probability of business interruptions.

Identify and document critical systems and their associate procedures – hardware and software requirements, recovery plans, communications / reporting, people, process, procedures, etc.

  • Identify the recovery times needed to prevent or reduce financial losses (Product loss, operational downtime, operational timing, manual processes (workarounds), workaround process vs. normal process data collection, data merge (recovery methods), etc.
  • Ensure both the recovery methods and workaround processes meet regulatory needs.

What to do When an Outage Occurs?

Identify the scope of the outage:

  • What systems, processes, data, and product are affected?
  • What is the scope of the outage?
  • When was the last known good backup?
  • How much data could be lost, corrupted, affected?
  • How quickly can it be accessed and restored?

Identify and Perform the Recovery Methods for the Outage

  • Are there a documented recovery plans for that system?
  • What is the defined maximum recovery time (before the business continuity is affected)?
  • What resources need to be assigned for the recovery and restoration?
  • What manual processes (workarounds) need to be activated?
  • What hardware, software, environmental, procedural and personnel requirements are needed for the recovery and restoration?
  • What are the restoration activities required once data recovery is complete?
  • How will the data from the workaround processes be integrated / merged with the recovered system?

Identify and perform the process for the documentation of the outage and recovery activities

  • Are there any gaps between the recovered data and the workaround processes? (lost data)
  • Do the gaps represent a risk to product, regulatory requirements, or end-users?
  • What reporting or records are required to document the outage, recovery and restoration activities for historical / regulatory purposes?

After the Recovery and Restoration

What was the root cause of the outage?

  • Can it be prevented from occurring again?
  • What are the redundancies that can prevent or reduce the affects of future outages?
  • Did all of the workarounds perform properly / adequately?
  • Did all of the restoration and recovery activities perform properly / adequately?
  • What were the cost of the outages in terms of equipment costs, personnel costs and operational losses

Conclusion

Today’s businesses collect and utilize data for almost every process and action performed on a day-to-day basis. Loss of access to that data can degrade, cripple, or halt business operations; this could cause risks to the company’s products, services, or in critical cases, even the company’s existence. Business continuity planning and risk assessments, disaster recovery plans with test drills, and data backup and recovery procedures will help a company ensure survivability in this dangerous world of disaster!

Bibliography

David M. Smith, P. (2003 and 2011). http://gbr.pepperdine.edu/2010/08/the-cost-of-lost-data/. Retrieved from Pepperdine University – Graziandio School of Business and Management: http://gbr.pepperdine.edu/2010/08/the-cost-of-lost-data/

DRG – Data Recovery Group. (n.d.). An Ebook on Data Recovery and Data Protection. Retrieved from DRG – Data Recovery Group: http://www.datarecoverygroup.com/sites/default/files/datarecoveryebook.pdf

DRG – Data Recovery Group. (n.d.). Data Recovery Checklist for Small Business. Retrieved from www.datarecoverygroup.com: http://www.datarecoverygroup.com/data-recovery-checklist-small-businesses

Ellegent Systems Inc. (2012). Ellegent Systems Inc. – Backup and Recovery. Retrieved from http://www.ellegent.com/: http://www.ellegent.com/docs/Backup/BDR.pptx

Iron Mountain. (2011, Dec 09). Hidden Data Recovery Costs, Revealed. Retrieved from Iron Mountain – Knowledge Base: http://www.ironmountain.com/Knowledge-Center/Reference-Library/View-by-Document-Type/General-Articles/H/Hidden-Data-Recovery-Costs-Revealed.aspx

Kroll Ontrack. (n.d.). Kroll Ontrack – Emergence of Business Continuity to Ensure Business and IT Operations. Retrieved from www.krollontrack.com: http://www.krollontrack.com/library/odrcontinuitywp_krollontrack2011.pdf

Kroll Ontrack. (2010, Jan 01). The Data Recovery Blog. Retrieved from Kroll Ontrack: http://www.thedatarecoveryblog.com/tag/remote-data-recovery/

Kroll Ontrack. (2013). The Data Recovery Blog. Retrieved from The Data Recovery Blog: http://www.thedatarecoveryblog.com/

READY.Gov / FEMA. (2012). Business Continuity Plan – Ready.gov. Retrieved from http://www.ready.gov/business/implementation/continuity: http://www.ready.gov/business/implementation/continuity

Stephanie Armour, USA Today. (2006, Jun 12). USA Today – Lost Digital Data cost businesses billions. Retrieved from USA Today: http://usatoday30.usatoday.com/tech/news/computersecurity/2006-06-11-lost-data_x.htm

Technology on Premises. (n.d.). The Causes and Consequences of Data Loss to Small Business. Retrieved from www.topitservice.com: http://www.topitservice.com/content/causes-and-consequences-data-loss-small-business

Author

Louis Rutledge

Mgr. Services Development MasterControl