[SystemSafety] Road to Damascus moment in Functional safety engineering - was FOSDEM talk by Paul Sherwood
Les Chambers
les at chambers.com.au
Thu Feb 13 12:54:14 CET 2025
Phil and Rolf,
I humbly offer my comments on your paper, Continuous Learning Approach to
Safety Engineering
Firstly, what a relief, finally someone has stated the bleeding obvious. Up to
this point, this list has been silent on the wicked problem of how systems
engineering process must change in the light of artificial intelligence.
I note that this is not a phase change, its an inflection point where all
the work of the past 50 years has been blown up, run over by elephants if you
will. Systems Engineers, we are in a road to Damascus moment with the bright
light of artificial intelligence shining down upon us. We are embarked, like
it or not, what shall we do?
[Phil & Rolf] We have a reasonable basis from decades of deployed systems in
a variety of domains to conclude that such standards tend to help ensure
safety. But we dont know exactly how they accomplish this, nor the degree to
which safety outcomes are influenced by the specific activities required by
standards
I disagree. I do know exactly how they accomplish this. Working for decades at
the Systems Engineering coal face with several companies that have encountered
these standards for the first time, my personal experience has been a step
change increase in organisational capability maturity. Given that the
standards were invariably attached, as compliance requirements, to multi
million dollar contracts my clients were forced to behave professionally. By
this I mean they actually did whats gurus recommended in the Systems
Engineering textbooks. Simple things like formality in requirements capture,
independent V&V, formal functional safety processes and a formal approach to
configuration management. There has always been in our industry, a massive gap
between knowing and doing. Any educated software engineer knows the right
thing to do, getting his company to actually do what should be done requires
some kind of driving force. The prospect of not getting your invoice is paid
by the client provides that enabling force. In the voice of general
Westmorland, When youve got em by the balls, their hearts and minds tend to
follow.
[Phil & Rolf] Moreover, increasing software content and adoption of novel
technologies such as machine learning are dramatically increasing the
complexity of deployed safety critical systems. The role of current integrity-
level based approaches is in doubt for such future systems without significant
changes.
I disagree. Significant changes are not the solution. Where an AI is deployed
as part of a control system the standards are flat out non-applicable and need
to be binned.
For example, IEC 61508 is based on the assumption that formal requirements
capture will result in one or more complete, correct, and unambiguous
system/software requirements specifications that can be validated with human
executable tests. Take a Tesla motor vehicle. Teslas requirements
specification is thousands of hours of video, some of it synthetically
generated. Their final system validation amounts to putting a vehicle on the
road and counting how many times the driver has to correct its trajectory -
that is if the driver is paying attention. In this environment catastrophic
test failure means vehicle damage, injury, or death. WHAT!!!!
[Phil & Rolf] Currently prescribed measures might be insufficient to guarantee
intended safety integrity due to uncertainty as to the predictive power of
integrity level engineering practices vs. real-world safety outcomes.
As far as I know there are no currently prescribed measures for evaluating
the safety integrity of a large language model (LLM). At least nothing that,
using the classical EN 50128esque criteria, would justify its deployment in a
safety critical control system. The behaviour of this entity is a mystery even
to it creators. In fact the modern discipline of model interpretability,
first thought of in 2016, focuses on understanding how large language models
(LLMs), like GPT-4, generate outputs based on given inputs. See Lipton, The
Mythos of Model Interpretability. Note the language of this title. This is a
research project with no clear outcome as of today.
The fact that LLMs exhibit any intelligence at all was a surprise to its
creators in 2015. You could say its almost a side-effect of their research.
There are parallels here with the Flemings discovery of penicillin (gasps of
OMG its alive). The net result of this accidental emerging intelligence is
that millions of lines of code are currently being replaced by a blob of data
which, when stimulated by software, is nondeterministic and not understood by
the creators. (Jesus wept!)
My point is, Do you really want this LLM component determining the trajectory
of your motor vehicle when barrelling down a two lane road at a relative speed
to the oncoming traffic of 200 km/h - when its developer doesnt understand
how it works, has not documented how it should work with a validatable
specification and thinks its legitimate to foist it on you with a YOLO
release under the cover of the name Full Self Driving?
And when you die its, Aw shucks we WILL do better.
[Phil & Rolf]
which analysis and architectural patterns are providing how
much contribution to safety outcomes ..
The elements of systems using AI have replaced traditional hierarchical
architectures with a blob of data. In the past we could test them with
progressive module, unit, subsystem, integration and system testing. Now the
total hierarchy has been replaced by a nondeterministic blob. Validating the
blob is currently a research project. Controlling the behaviour of the blob is
another one, see constitutional AI - a proactive effort to embed ethical
considerations into AI systems. Ethical considerations? Engineers let us
gurd our loins to focus on one small aspect of ethicle - let us do no harm
to users.
[Phil & Rolf] They have no way to argue the predictive power of their safety
case for real world safety outcomes other than experts say following
prescribed engineering rigor requirements should be OK.
At the current state of play, I cannot imagine how a credible safety case is
possible if nondeterministic AI is deployed in a motor vehicle. I would love
to be proved wrong.
My attempts to validate this statement typically turn up the same response:
ChatGPT: Prompt
In the context of systems engineering of Tesla motor vehicles, has Tesla
released any information on the engineering process for safety critical
control systems in their motor vehicles
Judge the response for yourself.
I do not find the following element of ChatGPTs response credible given that
guidance for validation of LLMs is, as yet, a work in progress:
ChatGPT QUOTE:
Safety Standards Compliance: Tesla emphasizes adherence to safety standards
relevant to the automotive industry, including guidelines from organizations
like ISO 26262, which pertains to the functional safety of electrical and
electronic systems in vehicles.
END QUOTE
New Age safety cases
[Phil & Rolf] For the first phase, we propose using a specific formulation of
a Safety Performance Indicator (SPI) as a quantitative measure for claim
satisfaction: An SPI is a metric supported by evidence that uses a threshold
comparison to condition a claim in a safety case [6]. Any quantitative
computations are encapsulated into a threshold comparison, and the result is a
logic value related to the truth of the associated claim.
In the context of safety-critical systems and the development of safety cases,
a safety claim is a statement made by the system developers asserting that a
system is safe for a specific set of intended uses or contexts. The claim is
typically supported by evidence gathered through various means, such as
testing, analysis, and verification activities. Safety claims are crucial
components of safety cases, which systematically present arguments and
evidence to demonstrate that the system's safety requirements have been met.
But
What if a set of discrete systems safety requirements are not specified, only
implied in a continuum of situations presented with a real-life or synthetic
video with human or synthetic judgement that the systems response to some
stimulus such as a child crossing the road is, in fact, safe.
Sensing of vehicles environment is also within the scope of safety claims.
Accurate sensing of the vector field surrounding vehicle, i.e. anything animal
or mineral with a trajectory likely to bring it in contact with the vehicle,
must be evaluated. The vehicles AI must have an accurate world model to drive
its control decision-making. It must know that mowing down a mother with
child is considered bad form but moving on in the face of a paper bag blowing
in the wind is permissible.
The concept of discreet safety requirements and discreet tests validating
these safety requirements is now irrelevant.
Further, the number of situations thrown up by thousands of hours of on-road
video would be approaching infinity by now, consigning human executable
validation testing to the Paleolithic past - along with the standards that are
encouraging us to perform this impossible task. It is clear to me that the
only practical solution is to send a validating AI to catch a misbehaving AI -
which rases the obvious question, Who or what validates the validator?
I could go on but I hope the above is enough to support my assertion that,
given technology has changed so radically, we need to respond by approaching
development operations with a fresh perspective based on what AI can do for us
in the development shop. The concepts of accurate world models, validating
agents and validatable constitutional AIs need to be front and centre.
Epilogue
By an accident of history, in a stroke of extreme good luck, my career in
systems engineering has spanded the first 50 years of the new era where
software was used in anger in control systems, where software was given
complete control of chemical reactors for example. It has been a wild and
exciting ride. In our ignorance, some of the things we did in those early days
(1970-1990) give me dark nights of the soul even today. But slowly we learnt
and ultimately our learning was codified in the highly effective standards we
currently have.
But
the playbook that we so lovingly assembled over the past 50 years is in
the process of, or has already been, thrown away where large language models
are deployed as components of control systems, replacing thousands of lines of
code.
There is no point in complaining, we need to keep on:
Some of the big questions we need to answer are:
1. How to validate an AI prior to delivery
2. How to progressively validate an AI that routinely learns and changes its
behaviour post initial delivery
3. How can government regulate the safety integrity of motor vehicles and
another appliances, including aircraft (gasp), controlled by AI. I note, the
answer is not self-regulation, look at what happened with the Boeing 737 MAX.
Phil and Rolf, once again thanks for raising the issue, it requires robust and
illuminating debate followed by solutions. Im a bit sad it wont be my
generation that delivers the solution.
Over to you next generation Systems engineers.
Good luck
Les
> Paul,
>
> Thanks for sharing this. Hot take: felt more like a sales pitch than a
> really concrete explanation. Old ways dismissed based more on "nobody
> wants to do it" and "new is better". I get why that argument has
> appeal, and surely some (not all) people don't do the old ways as well
> as they should. But it was pretty light on why "new will be sufficient
> for acceptable safety" vs. deciding that doing the new ways really well
> will automatically be fine. Â I just recently saw a live talk that had a
> lot of the same arguments and was similarly left with more questions
> than answers on this topic. Maybe those in the trenches on the new
> technology have a different viewpoint. To be sure, this is not me trying
> to die on the hill of the old ways, but rather expressing what I think
> is appropriate caution on jumping to new system design approaches on
> life critical systems.
>
> We can justify the old way in hindsight in that it seems to work, even
> if we struggle to rigorously explain why. Do we want to jump to a new
> way without understanding why it is expected to work and spend decades
> of mishaps climbing the hill to getting to where it really needs to be?Â
> Or is there a way to have some confidence about it before then?
>
> Assuming they make good on everything they plan, what would help me a
> lot is understanding the basis for the safety case for their approach in
> a general sense. What parts of assurance does it provide, and what
> parts does it not provide? What underlying assumptions does it make?Â
> As a simple example, supply chain attacks will be a huge issue compared
> to a proprietary OS + custom application. This is not news to them, but
> is an example of the type of new challenge that might surprise us with
> an mishap news headline.
>
> Discussion here is fine, but this is not something we are likely to
> resolve in an e-mail chain. This is a whole discussion that needs to
> happen over many years in many forums. And it is not just about FOSS in
> general. There are these concerns plus additional specific concerns
> about using machine learning technology, tool chains and libraries as
> well, in which we already have multi-ton machines hurtling down public
> roads by companies who have decided (most, but not all of them) that
> core safety standards -- including ones specifically for their
> technology -- are irrelevant because they stifle innovation.
>
> We might not be happy with old-school safety practices, but there are
> lessons there learned the hard way over decades. We should be reluctant
> to throw them out wholesale without taking some time to figure out what
> lessons we need to learn with the new approaches.
>
> My own take on this topic in a somewhat more abstract discussion is
> here, in which I point out the gaps in understanding how/why current
> approaches work and how we might close those gaps going forward for any
> approach:
>
> * Johansson, R. & Koopman, P., "Continuous Learning Approach to Safety
> Engineering
>
<https://users.ece.cmu.edu/~koopman/pubs/Johansson2022_CARS_ContinuousLearning
Safety.pdf>,"
> Critical Automotive Applications: Robustness & Safety / CARS at EDCC2022.
> *
https://users.ece.cmu.edu/~koopman/pubs/Johansson2022_CARS_ContinuousLearningS
afety.pdf
>
> Kind regards,
> Phil
>
> On 2/8/2025 10:03 PM, paul_e.bennett at topmail.co.uk wrote:
> > I cannot recall if this might have been linked in recently, but from
> > the date iof the talk, probably not.
> >
> > <https://fosdem.org/2025/schedule/event/fosdem-2025-6204-the-trustable-
software-framework-a-new-way-to-measure-risk-in-continuous-delivery-of-
critical-software/>
> >
> > Paul gives a view on the attempt to evaluate Free Open Source
> > Software for trustability. I offer the link with no further comment
> > or claim from my side. I'll let people make up their own minds.
> >
> > Regards
> >
> > Paul E. Bennett IEng MIET
> > Systems Engineer
> > Lunar Mission One Ambassador
>
> --
> Prof. Phil Koopmankoopman at cmu.edu
> (he/him)https://users.ece.cmu.edu/~koopman/
--
Les Chambers
les at chambers.com.au
https://www.chambers.com.au
https://www.systemsengineeringblog.com
+61 (0)412 648 992
More information about the systemsafety
mailing list