Official Blog
CrowdStrike outage: 5 key points to strengthen data resilience
Chanda
August 5, 2024

CrowdStrike outage: 5 key points to strengthen data resilience

On July 19, 2024, CrowdStrike’s attempt to update the “Falcon Sensor” for real-time threat detection and endpoint protection led to a system crash that affected 8.5 million Microsoft Windows devices, causing widespread IT and operational disruptions worldwide. Although this incident was not caused by a cyberattack or malware, it underscores the importance of a comprehensive and reliable backup and disaster recovery strategy to prevent disruptions to business operations.

CrowdStrike causes immediate global impact

The outage was first detected in Australia, where the “blue screen of death” spread across Windows devices worldwide, significantly disrupting users, companies, and critical service providers. By the afternoon, approximately 2,600 flights in the U.S. were canceled, while over 4,200 flights were affected globally and had to resort to manual check-ins, according to the Wall Street Journal.

How long RTOs impact business operations

Following the incident, CrowdStrike provided technical support and released a patch to help restore system operations. However, many systems used by organizations were unable to be automatically recovered via a repair program. When that happens, IT admins must manually boot every affected device into safe mode and delete the problematic updates from CrowdStrike.

Though Microsoft introduced a “process-minimizing” solution within the next day, which helped automatically delete the faulty files, manually booting individual devices into WinPE via a USB drive was still laborious, lengthening the data recovery process. Downtime leads to operations disruptions, loss of productivity, additional costs, increased compliance risks, and ultimately, a negative customer experience and tarnished corporate reputation.

Build a strong data protection plan for continuity 

Comprehensive backups: Deploying a backup strategy that regularly covers all sources and devices without isolated data is crucial for businesses, especially those operating across multiple platforms or tools.

Regular restoration drills: Equipment and system failures are never predictable. Continuously testing the recoverability of backup data is essential for verifying the effectiveness and availability of the organization’s disaster recovery plans.

Instant VM recovery: Virtualize services and restore operations as quickly as possible to ensure reduced downtime and business continuity.

Cross-platform restoration: In CrowdStrike’s case, only one platform was affected. Businesses can minimize the risk of data loss by ensuring that all data, applications, and systems can be recovered and reinstated across multiple environments.

Off-site backup and recovery: In addition to backing up on-site data, implementing an off-site backup mitigates risks associated with data loss. If a company had deployed an off-site cloud backup during CloudStrike’s event, it could have easily resumed services from the said off-site backup site.

Backups are the key to data resilience

Having a secure backup and disaster recovery plan is the key to data resilience and a crucial step for any business pursuing digital transformation. The CrowdStrike incident firmly highlights the importance of establishing a robust backup strategy and testing backups on a regular basis to maintain continuity in the face of unforeseen circumstances.

Click here to learn how to strengthen data protection with Synology solutions.