Business Continuity and Disaster Recovery planning is an essential part of cyber preparedness at any company. BC/DR planning is also the kind of thing that most people put off. But it is becoming increasingly important in assessments by insurance carriers for cyber and errors and omissions policies.
Making sure that all your organization’s stakeholders have both a common understanding of the BC/DR decisions that have been made, and the impact of these decisions on how long it will take to get back to business as usual, is just as important as the recovery framework itself.
Organizations that don’t ensure they’re asking the right questions about how they recover from catastrophic events like ransomware - and also making sure that all the stakeholders understand the answers they get spend more and take longer to recover from disasters and outages.
The Key Expectations to Set
Your Recovery Point Objective tells you the completeness of the restoration and the resultant user experience;
Your Recovery Time Objective; how long will it take to get to that RPO under realistic conditions of an emergency?
The Key Questions to Ask and Answer
- How do we inventory the systems we refer to as ‘Business-Critical’?
- Are we certain that it is comprehensive?
- Have we contextualized the relative value of each?
- If so, how? If not, why not?
- What is our Recovery Point Objective for each type of business-critical system?
- Servers (cloud or otherwise)?
- Customer relationship management, ticketing, communications, etc.
- Data entry systems?
- When can, for example, your call center and logistics systems be back online?
- Desktop, laptop, and mobile computers?
- Can your employees answer an email, work on a spreadsheet, etc?
- What is our Recovery Time Objective for each type of business system?
- Are we backing-up, or snapshotting our systems?
- The answer is the difference between ‘Back up like nothing ever happened" or “And now we have to load up everything from backup.”
- Have we ensured that our recovery systems are completely air-gapped and separated from our production systems in all respects?
- A sadly common problem is ransomware spreading to and encrypting the backup files.
- By, “Air gap,” do we mean, ‘On another tectonic plate’ or area geographically isolated (to prevent being taken down by a regional disaster or conflict)?
- We should.
- Have we tested these assumptions?
- RPO and RTO are great, but how certain are we of our ability to meet these objectives?
- Are we testing this?
- How often?
Speak Clearly; Ensure All Parties Are Understood.
“We can restore within six hours from backups,” is a great case in point. To a non-technical manager, that probably sounds like “everything is back the way it was eight hours after a horrible event.” But that may not be what the IT leaders were promising. The use of words that can mean different things to different people can lead to confusion:
The critical question is, ‘Restore’ to what standard?" There can be tremendous variation between what a business manager expects “restore” to mean, and what an IT manager is stating that “restore” means in the context of the conversation. For example, does it mean that everyone can turn on their machines and start exactly from where they left off before the incident?
Or, maybe, does it mean users can start a completely vanilla fresh Windows install, and then they can download - “restore” - all their data from backup?
And, are we talking about desktops and laptops and mobile devices, and applications running in the cloud or on-prem, or does the answer differ for each of those categories? It’s likely the latter. Do the business leaders understand the nuanced differences?
The answers to these questions can be given by IT professionals without any intent to deceive, in fact they can do it with only the best of intentions. Often, when budgets are tight and when senior leadership doesn’t have time or patience to truly dig in to the options and fully comprehend the nuance of the answers before approving something, IT pros may diligently cut corners to accomplish a critical mission (recovery) to the extent they legitimately can.
So, when the defecation hits the ventilation, managers who heard the deluxe response when their IT teams were telling them the no-frills package response? There can be a row.
Finally, a specific listing of how those will be realized in different areas of your IT fabric: restoring as quickly as possible a business critical server (cloud or otherwise) to the condition it was in before an incident is objectively more important than restoring non-business critical systems - but there needs to be both an understanding of what systems are “business critical”, and a balance of how you assure you’re back to business as fast as you can be.
For example, if all your internal servers up and running to just where they were pre-event in an hour but it takes eight hours to restore all the internal systems that talk to those servers, that’s potentially less valuable than having both back in three hours.