I usually start the process of assessing risk by creating a Risk Registrar, filling in the risk statement, description, details on the loss scenario and so on… eventually working my way to the risk analysis part of the register.
The classic formula for risk is:
Risk = Probability x Loss
And in the popular modern Open FAIR™ Risk methodology:
$ Value of Cyber Risk = SUM (LEF X $ ML)
Where:
[inject id=’code-47fd23f73a9caecab1e206306adae7f9′]
Many of these parameters are similar and confusing:
“Confusing a loss event with a threat event in an analysis will lead to inaccurate results. Remember, Loss Event Frequency is how often the organization actually suffers a loss and the damaging event materializes.” (Source: the Fair Institute)
Furthermore, since many of these parameters are not available as an accurate figure the calculation uses ranges and a statistical simulation to calculate the probable range of the risk value.
For the ML section, filling in the boxes (for potential loss) isn’t too bad, as they can be defined based on known business parameters such as revenue loss, cost of response and so on.
But then we come to the “head-scratcher” boxes of LEF, TEF Vulnerability. What values do you enter in those?
Am I to hunt the latest cyber reports and look at my geolocation and industry breach history? Does past history predict future scenarios? Are all food manufactures the same? And where am I in this benchmarking cauldron?
So, as it often happens with too many such vague parameters: you reach a state of GIGO (Garbage in Garbage Out). Either your calculations are wrong or the range of the resulting risk score is too wide to be useful for decision-making.
In this article I propose that in order to answer the above questions, we need to use a data-driven approach which combines OT breach attack simulation (OT-BAS) with statistical simulation techniques.
By using virtual OT-BAS we are able to obtain data points on system vulnerability, i.e., Threat Capability and Resistance Strength on a specific production system under consideration (SUC), not just generic information. And combining breach attack and statistical simulations de-facto applies a data-driven approach of entering values into statistical simulation tools instead of “guesstimating”.
Using the above approach, we can reduce our input variance on threat actor capabilities and resistance strength and thus narrow down the value range of risk we get as an output.
For example, I’ll use the FAIR-U tool on a Phishing database breach scenario supplied with the tool.
For the purpose of this post, I won’t change the Loss Magnitude (ML) side, and only concentrate on the left side, Loss Event Frequency (LEF).
Our starting-point: Using common sense (I hope my sense is common), with no “prior” information on what values to enter i.e., the “guesstimate” methodology.
The initial values I entered are: 50/50 on threat capability, 50/50 on probability of action, 40-60-80% on resistance strength, and [1-2-4] on Contact frequency.
As you can see in the chart below, the starting-point LEF looks good, with values in the tolerable risk area.
I’ll run the same statistical simulations with values from an OT-BAS simulation.
After entering the resistance strength and threat capability (TI), taking into account the security level achieved (SLA) at the site, the digital image, and the relevant threat intelligence, the resulting risk values now look very different, and not for the better.
Min | ML | Max | |||
Line | Start: | ||||
1 | Resistance strength | 40 | 60 | 80 | Vulnerability |
2 | Threat Capability | 50 | 50 | 50 | |
3 | Contact Frequency | 1 | 2 | 4 | TEF |
4 | Probability of Action | 50 | 50 | 50 | |
5 | Risk | 0$ | Avg 138K$ | 8.1M$ | Within tolerable risk |
6 | BAS: | ||||
7 | Resistance strength | 30 | 40 | 50 | Site SLA using BAS |
8 | Threat Capability | 60 | 70 | 80 | TI, MITRE, BAS |
9 | Contact Frequency | 1 | 2 | 4 | No change |
10 | Probability of Action | 60 | 70 | 80 | Insight From (7)(8) |
11 | Risk | 0$ | Avg 4.3M$ | 22.7M$ | Exceeding tolerable risk |
In today’s everchanging environment, an annual risk assessment is no longer sufficient. To continually monitor LEF as threat landscape and vulnerabilities change, we need to continuously monitor key risk indicators (KRIs) to alert us of changes.
The IOR institute defines KRIs (Key Risk Indicators) as metrics that provide information on the level of exposure to a given operational risk which the organization has at a particular point in time.
KRIs are an early warning system of changes in our threat landscape and system vulnerabilities, which provide the needed time to proactively address changes in our risk posture.
Using the changes in these KRIs, we re-run our OT-BAS and enter the new values from the OT-BAS in our risk registrar using a statistical simulation tool for the probability ranges.
In order to continuously track changes in LEF, we recommend assigning KRIs (key risk indicators) to TEF and vulnerabilities, in particular probability of action, threat capability and resistance strength.
Example KRIs for LEF:
TEF KRIs: “How many times will the asset face a threat action?”
Vulnerability KRIs: “What percentage of threat events are likely to result in loss events”
It’s important to note that KRIs scores are dynamic. They may not be as frequent as EPS to a SIEM, but KRIs alerts need to be timely to identify the shift in risk posture and give us the needed time to adjust our defenses.
Changes to a KRI signal a change in the level of risk exposure associated with specific processes and activities. Thus, KRIs are pro-active metrics used by organizations to provide an early signal of increasing risk exposures in various areas of the enterprise.
We recommend the following work flow:
KRIs are added to the risk registrar to pro-actively recommend mitigation controllers that would reduce the risk before a loss event happens.
Below is an example of the extended risk registrar with quantitate values to address LEF.
Risk category: | OT operational |
Risk Program: | Cyber Origin / network connectivity |
Risk Title: (scenario)
|
Loss of control on heat level – boiler tank A12 Remote Safe shut-down not possible |
Frequency of scenario as security target LEF (times per year): |
1<N<=5 |
Current values after BAS | |
Current Threat likelihood of risk title (simulated): | 80% (High) |
Resistance strength | 40% |
Threat Capability | 70% |
Estimated current LEF | 1<N<=10 |
BAS simulation to SLT3 | |
Threat likelihood of risk mitigated to IEC62443 SLT3 (simulated): | 45% (Medium) |
Resistance strength | 85% |
Threat Capability | 70% |
Estimated simulated mitigated LEF at SLT3 | 1<N<=5 |
Overall Impact rating: | High (I omitted the pre overall impact calculation stages for simplicity) |
Overall risk rating: | High (80%) |
Risk tolerance: | Medium (45%) |
Risk response: | Mitigate overall risk rating down by reducing threat likelihood SLA = SLT3 to reach tolerance level |
KRIs for risk title: | New ATT and cyber tools
Change of asset vendor Change in connectivity to asset Change of project on logic controller |
So next time when we are challenged by the “head-scratcher” Loss Event Frequency, we recommend a data driven approach using statistical tools such a FAIR-U and to add data points that are derived from simulated breach attack simulation on your specific production environment, thus reducing the ranges of inputs for the statistical calculation.
If you’ve found this article interesting, please visit and follow Radiflow on LinkedIn, where you’ll find a wealth of exclusive content.
Harmonizing risk and consequence strategies across IT and OT environments for greater cyber resilience
Strengthening OT Resilience: Protecting Critical Systems in a Rapidly Evolving Threat Environment
Quarterly ICS Security Report 2024 Q3