While continuity of business operations is paramount for any organization, disruptions are inevitable. They can be caused by various reasons, ranging from cyberattacks (including ransomware) and system failures to natural disasters and human error. Business Continuity and Disaster Recovery (BCDR) are necessary to mitigate these risks and maintain service availability as quickly as possible. Without a BCDR plan in place, organisations may face significant financial losses, reputational damage, and regulatory penalties (for example, if they must be compliant with DORA).
What is Business Continuity, Disaster Recovery and High Availability?
Disaster Recovery (DR) comprises a set of tools, backups, replications, and processes used to recover workloads in case of disruptions. Business continuity (BC) encompasses processes, tools, and architecture built on top of DR to ensure continued operating of critical business services during an outage. BCDR helps organizations recover from outages and quickly resume mission-critical functions.
From our experience, BCDR is often confused with high availability (HA), which involves distributing an application or workload across redundant physical or virtual infrastructure. Some organisations mistakenly believe that enabling redundancy components for critical systems constitutes having BCDR in place. However, both BCDR and HA play critical roles in maintaining IT resilience. While HA aims to prevent downtime, a BCDR ensures timely recovery when unexpected disruptions occur.
5 must-haves for your BCDR plan
Another area that causes confusion is a BCDR plan. Organisations often believe that having the right tools in place means they are prepared for any disruptive events. However, a proper approach to BCDR should include not only comprehensive solutions covering backups, replication, and isolation, but also a well-documented and tested plan on how to recover critical data and business applications.
At a minimum, your BCDR plan should include the following:
- Detailed documentation for all production workloads and their dependencies. This is as a critical starting point for an effective disaster recovery plan. Without knowing exactly what needs protection, there’s a risk of not recovering all essential data in the event of disaster.
- A list of potential threats that could cause downtime. It’s essential to understand the risk scenarios your organization is vulnerable to and focus on them, as you can’t be prepared for everything.
- Contact information for all application owners and support vendors, so you can reach out to them in case of emergency.
- Regularly validated and documented backups. Nothing is worse than experiencing an outage and realizing that your backups are corrupted. Testing your ability to restore in an emergency is a fundamental aspect of any backup strategy.
- Defined and fully agreed-upon recovery point objectives (RPOs) (defining the maximum tolerated point in time for data recovery) and recovery time objectives (RTOs) (defining the time it takes to return the IT environment to a fully working state after the initial disruption)
If you’d like to learn more about a BCDR plan, please watch our BCDR webinar recording.
What is cyber resilience and cyber resilience plan?
An important factor when considering your BCDR strategy within Azure is cyber resilience. Cyber resilience is the ability to respond to a cyber attack, resuming business operations quickly, and protecting valuable electronic data. Cyber resilience is intrinsically linked to your backup strategy, as backup provides a key component to recovering from a cyber attack. The mindset with Cyber resilience is one of planning for the eventuality of a cyber attack, as opposed to simply trying to stop it from happening in the first place.
When considering a cyber resilience plan, it is important to understand the business impact of a cyber attack and how long your business can survive without having access to critical data. In addition to this, understanding what applications and data are critical is an important factor as well when devising a plan. Typically, in the event of a ransomware attack, a disaster recovery solution will not protect you, as the data within your primary site which has been encrypted has most likely been replicated to your secondary site. In this instance, you will need to restore from backups to an isolated recovery environment – all production environments and infrastructure will be inaccessible for an extended period until cyber insurance, law enforcement, etc. have completed their analysis. A separate recovery area (such as a new Azure isolated tenancy) is a must. In this situation, tools such as Azure Site recovery or other replication tools are of limited value.
Ensuring backup data immutability
To ensure that your organizations backup data is recoverable, it is important to ensure your backups are immutable and indelible in nature. Whether this is by availing of Azure native backup services, or using leading 3rd party tools such as Commvault, this is a critical area to focus your efforts upon.
Also, ensuring your backup system is isolated, hidden and has all the relevant access controls enabled to make it difficult for a bad actor to gain access is crucial. If an attacker does have access to your backups, you have no choice but to pay the ransom and hope that your data is returned safely thereafter. Storing at least one copy of backup data in an isolated, air-gapped tertiary environment such as a separate Azure cloud availability zone in a separate Azure tenancy or in a completely independent cloud hosted storage service can be a key strategy for protecting against ransomware.
In addition, backup solutions equipped with early warning capabilities that can detect unusual activity can provide an early indication of an attack (sometimes before perimeter security systems identify it). It gives an opportunity to stop the attack at source or at least limit its impact.
Once an organisation has identified an attack and is either trying to determine the blast radius or is actively recovering from that attack, scanning of your backup data is a requirement to validate it is free from contamination. It is a key process so that you do not restore encrypted data or unwillingly believe you are protected when you are fact compromised. Tools such as Microsoft Defender or best in class 3rd party tools such as Commvault Threat Scan can be very helpful within this process and help to automate the sanitisation and recovery of workloads to an isolated recovery environment (cleanroom) and from there return them to production.
After building a cyber resilience incident response plan, it is also important to constantly test the process by running quarterly cyber recovery tests of critical applications to show demonstrable evidence of cyber recovery readiness and proactively resolve any issues you uncover during these tests.
Ergo can help with all of these requirements, including advisory, 24×7 managed services and incident response services.
Azure BCDR strategies
Cloud solutions offer rapid recovery through automated backups and replication across multiple data centres. In the context of Microsoft Azure, three main scenarios are supported:
- Protect in Azure scenario is designed for workloads that run in Azure. In simple terms, this method leverages one or more additional Azure regions as failover locations and may involve a combination of Azure tools, services, application architecture, and operational processes to enable self-service backups and restores. Azure provides a built-in service that ensures secure backup for Azure resources.
- Protect to Azure scenario is designed for workloads running on-premises.
Azure Backup offers greater resilience compared to most data centre backup options. Additionally, it’s a cost-effective backup option since Azure charges only for resources used during failover, and doesn’t involve upfront costs, including facilities, hardware, and electricity. - Data backup can be accomplished using Microsoft partners such as Commvault, Rubrik, Veeam, or with Azure Site Recovery. Azure Site Recovery copies data from hard drives and replicates it into the cloud. The copy-replication process is asynchronous, meaning data is written to backup resources after it has been written to primary storage. This sequential process requires substantially less bandwidth and works effectively over long distances.
- SaaS to Azure scenario. Despite common assumptions, Office 365 or SharePoint data isn’t automatically backed up, and organisations need to take care about creating a copy of their data. Microsoft 365 Backup is currently in preview and begins rolling out to organizations in mid-2024. It will offer rapid recovery from typical business continuity and disaster recovery scenarios. Moreover, replicated copies of data will be spread across geographically diverse data centres, providing automatic protection against physical disasters and seamless failover to live active copies.
Conclusion
BCDR is an essential for any organisation that prioritizes the availability and integrity of their services. By understanding and implementing Azure Backup and Azure Site Recovery, companies can safeguard their operations against the unexpected, ensuring long-term success and resilience.
As an Azure Expert MSP, Ergo provides Disaster Recovery and Cyber Resilience Services that designed to help organisations recover from unexpected attacks or disasters and minimise downtime. Our services cover all aspects of BCDR, including backup, replication, and recovery of critical data and applications.
Talk to us about your requirements
Ensure your business is prepared with a robust BCDR plan tailored to your needs. Fill out the form below to get in touch with our experts and start building a resilient future today!