Sep 23, 2024 2:33:09 PM

Thoughts on cyber risk

I want to start my first blog post talking about risk management. Partly in the context of cyber security, but also to bring to light how other industries that I am familiar with perform risk assessments. This is probably going to be multi-part post, as this is something I have thought about many times in the never ending battle to justify security spending.

After I got my bachelors in physics I got my first job working at a company that did disaster modeling for nuclear power plants, with a specific focus on fire protection engineering. Like most regulations that have been created, something very bad had to happen. In this case it was a fire at Browns Ferry Nuclear site.

“The fire started at BFN on March 22, 1975, by a worker using a lit candle to check for air leaks. This risky action ignited a temporary polyurethane cable penetration seal.”

This lead the Nuclear Regulatory Agency (NRC) to create (https://www.nrc.gov/reading-rm/doc-collections/cfr/part050/part050-appr.html) as guidance in how to mitigate risk from loss of system control from fire. The idea was for every plant to have a run book of “You lost control of this machine/device/circuit that is required to safely shutdown the plant, here is the next step you take”. This is a nice idea, contingency planning is good. The reality was, things don’t usually fail in isolated silos, so when there is a problem it often has cascading failures. Suddenly that run book can not handle every permutation of failure and the poor plant operator has a control board that is lighting up like a Christmas tree.

Enter HRA

There is another problem that comes with this. It turns out, humans are not good in emergencies. They are so not good in emergencies that tasks that require human intervention (go move this breaker into position to get power to the pump) created a new field of study, human reliability analysis (HRA).

This idea of a human just not doing what they needed to do (for whatever reason) is tricky for a deterministic model to incorporate. This leads to a world of needing to revisit the whole model of risk, thus the NRC created NFPA 805 (https://www.nrc.gov/docs/ML0429/ML042920189.pdf)

The creation of NFPA 805: Performance-Based Standard for Fire Protection created a bunch of new levers to tweak when it came to modeling risk. Now we could model probabilities into the failure of actions, as well as a probability of the actual event. The previous deterministic model said that any fire would rage uncontrollably until it hit a fire resistant barrier, then stop. The new probability model allows the model to include the fire/energy coming from combustion and decide a probability of ignition of other flammable materials, creating a lot more nuance (and math) to drive the risks. Thankfully, the impact work from Appendix R mostly carried over, impact being described as these are the components needed to safely shutdown the plant.

One in a million

Now what was needed was to take all the identified risks to the core components, sum up the probability of failure and have a number less than 1 in a million for core meltdown, and less than 1 in 10 million for large early release (Release of stuff in the atmosphere, 10 mile evacuation needed).

This leads me to the struggle I have had with cybersecurity risk, is that despite fire protection and cybersecurity both functioning as catastrophe modeling (low likelihood, high impact), the threshold for risk varies company to company. In cyber security, whenever we encounter new and scary things, we ask "What would NIST do" and go read what NIST has to say, so we crack open NIST SP 800-30 rev 1

To be continued...