On Friday, a significant update from CrowdStrike, a leading endpoint security software provider, led to a major global outage of Microsoft services, including Windows and Office 365. The update resulted in the infamous ‘blue screen of death’ (BSOD) for Windows devices worldwide, leaving users unable to access their systems and causing disruptions across various sectors.
What Happened?
The issue began following a CrowdStrike update that caused the BSOD error, displaying the message, “Your device ran into a problem and needs to restart.” However, restarting the device led to a boot loop, preventing access to the system. Microsoft identified the error with the STOP code “PAGE_FAULT-IN-NONPAGED_AREA,” which was traced back to a failure in a CrowdStrike agent system file.
Never trusted this antivirus software but didn't know it would come to this. #Crowdstrike https://t.co/q9jzLqro0F
— Edward Igarashi (@edward_igarashi) July 19, 2024
Impact of the CrowdStrike Error
The outage had widespread effects, as reported by the software status monitoring website Downdetector. Several Microsoft services, including the Microsoft Store and Microsoft 365, were affected. The disruption extended to critical services like 911 emergency services in multiple US states, banks, airports, and IT companies.
Aviation Sector Hit Hard
Berlin Air suspended all flights due to the technical problem, leading to a complete halt of check-in and flight services until 10 am local time. In the US, major airlines such as Delta, United, and American Airlines grounded their flights, with the Federal Aviation Administration (FAA) citing communication issues as the cause. India’s IndiGo Airlines and other Indian carriers also faced long waiting lines and booking errors, attributing the problem to Microsoft Azure.
Media and Financial Services Affected
The outage also impacted media outlets like the UK’s Sky News and CBBC, and Australia’s ABC News. The London Stock Exchange (LSE) faced issues that prevented the RNS news service from publishing on its website.
CrowdStrike and Microsoft’s Response
Both CrowdStrike and Microsoft issued statements following the outage. Microsoft acknowledged the problem and stated that several mitigation actions were in process, focusing on redirecting impacted traffic to healthy systems. A Microsoft spokesperson indicated that the issue arose at 6 pm ET, affecting customers in the Central US area.
CrowdStrike, in a statement behind a registration wall and on their subreddit, acknowledged the issue and provided a workaround. Users were advised to boot into Safe Mode or the Windows Recovery Environment, navigate to the CrowdStrike directory, and delete the file matching “C-00000291*.sys” to resolve the issue.
Fixing the BSOD Error
CrowdStrike provided a detailed four-step process to fix the BSOD error:
- Boot Windows into Safe Mode or the Windows Recovery Environment.
- Navigate to the C:\Windows\System32\drivers\CrowdStrike directory.
- Locate and delete the file matching “C-00000291*.sys.”
- Boot the host normally.
For those unable to perform these steps, it was recommended to defer the task to IT systems administrators.
Resolution and Further Developments
CrowdStrike’s engineering team identified the issue related to a “Falcon Sensor” on Windows following a content deployment and rolled back the changes. Despite the widespread impact, both CrowdStrike and Microsoft assured users that they were working on resolving the issue promptly.
The situation highlighted the vulnerabilities and widespread implications of software updates on critical systems globally. As the companies continue to monitor and address the fallout from the update, users and affected sectors are urged to stay informed through official channels and social media updates.
This incident underscores the importance of robust update testing and the need for contingency plans to mitigate such disruptions in the future.