July 29, 2024

Looking Past the Fog: Where do we go After the Crowdstrike-Windows Outage?

By John Rostern, Vice President, Cybersecurity & Digital Forensics, Marcum Technology

Looking Past the Fog: Where do we go After the Crowdstrike-Windows Outage? Cybersecurity & Digital Forensics

‘The fog of war’ is a term used to describe the loss of situational awareness in combat situations coined by the great Prussian military strategist Carl von Clausewitz in 1832, (‘nebel des krieges’). On Friday, July 19th many organizations found themselves fighting a battle to restore operations after a flawed update from the cybersecurity firm Crowdstrike. This created the largest global IT outage in history as affected Microsoft Windows systems were disabled by the ‘Blue Screen of Death’ (BSOD). As the ‘fog’ created by this outage now begins to lift, the IT and cybersecurity communities need to take a serious look at how a single QA failure could cause a massive global disruption.

It is easy and expedient to criticize the Crowdstrike QA process for allowing the problematic code to be released. In truth, no QA process provides 100% assurance and QA ‘escapes’ are an inevitability. Surely many DevOps teams and cybersecurity professionals had a moment of ‘there but for the grace of god go I’ as they watched their marketing and communications departments join in the public shaming of the Crowdstrike QA function. As the event and associated schadenfreude fades into the mist, we should take the opportunity to learn from it.

The first lesson involves adopting a process that included ‘blind trust’. The updates in question happened automatically with no opportunity for intervention. The update in question was installed without any action required. This ‘feature’, intended to reduce administrative workload and ensure timely application of security updates, was the true root cause of the problem as opposed to the flawed update itself.

The process should have considered the possibility of a corrupt, malicious, or otherwise problematic update. The update process must be designed to allow for appropriate release testing and, if necessary, intervention including delaying or rejecting the update.

The next lesson provided here concerns concentration risk. The market share of Microsoft Windows systems globally, when juxtaposed with the market share of those systems also using Crowdstrike describes a massive population of systems. Many organizations were not affected if their technology infrastructure was not at this intersection defined Microsoft Windows and Crowdstrike. Those who were at that intersection unknowingly created a single point of failure that likely was never considered in their threat model or risk analysis.

This is not to say of course that this could not have happened to vendors other than Crowdstrike. As noted earlier, QA escapes are unavoidable. The question now becomes one of risk mitigation. In addition to the process changes suggested above, perhaps organizations should consider not relying solely on a single vendor for security software or operating systems. Supplier and associated supply chain diversity can play a key role in mitigating concentration risk here.

In the broader context, this episode highlights the potential risks associated with the overall software supply chain. The assumed integrity of that supply chain, as evidenced by the blind trust placed in it by so many organizations, has created a large blind spot in the threat model of almost every organization.

With respect to the overall security and integrity of the software supply chain we must also consider a scenario where this had been the work of a malicious actor. Crowdstrike was quick, perhaps too quick, to announce that this was an accident or mistake by one individual. What if some future event is the result of a malicious act rather than an accidental QA escape? Indeed, if the QA process allowed for the accidental release of flawed code, then how can we expect that process to detect actual malicious code?

In summary, Crowdstrike is currently the subject of well-deserved criticism for this QA failure and the subsequent global impact on operations and the economy. However, there is plenty of blame to go around and questions must be asked as to how the affected organizations placed themselves into such a vulnerable position.

If you’re facing challenges related to cybersecurity threats, breaches, and bad actors, or are interested in learning more about identifying potential threats to your organization, contact Marcum Technology today.