[SystemSafety] Difference between software reliability and astrology

Prof. Dr. Peter Bernard Ladkin ladkin at causalis.com
Wed Aug 14 13:09:55 CEST 2024


On 2024-08-14 11:07 , Paul Sherwood wrote:
>
> On 2024-08-13 18:59, Prof. Dr. Peter Bernard Ladkin wrote:
>>> Which statistical processes are generally recognised to model software behaviour?
>>
>> We are talking safety-critical software.
>>
>> The answer is memoryless processes. "Memoryless" here is a technical term of statistics. 
>> https://mathworld.wolfram.com/Memoryless.html  It should not be thought to have much if anything 
>> to do with "computer memory". Some of us have had years-long "discussions" with engineers who 
>> were convinced it must have somehow to do with it, somehow.
>
> I had not heard of "memoryless" before. Is there a rationale for considering software/system 
> failure probabilities as exponential, or geometric?

Yes, It follows directly from the mathematics of memorylessness, as the Wolfram Mathworld article I 
cited shows. If you find that difficult to get into, you might try Kyle Siegrist's on-line text. Bev 
put me on to it a decade and a half ago and I've been citing it since.

https://www.randomservices.org/random/index.html  Especially, in this case, Chapters 10, 13 and 14.

> [PS] From previous discussions here and elsewhere (and prior consideration of ISO26262) I was 
> under the impression that software is 'generally recognised' to be (ideally) deterministic?
>>
>> [PBL] Did you read the note I forwarded?
>
> I confess I was put off when I saw the opening statement "Software execution is deterministic; 
> therefore it is not stochastic;..." etc.

That didn't come from me. It came from an Australian critical-systems assessor, M, commenting on the 
new proposed version of 61508-7 Annex D. He doubted that on-demand functions can be said to have an 
average probability of failure on demand (PFD_[avg]); and he doubted that continuously-operating 
functions can be said to have a Mean Time To Failure (MTTF). Those quantities,  PFD_[avg] and MTTF, 
are what are known as statistical parameters, namely their meaningful existence follows from the 
set-up. I reconstructed PFD_[avg] from the behaviour of an on-demand function for a given 
operational profile, so he was wrong about that. But I haven't yet managed to reconstruct MTTF for 
continuous software.

I did try deconstructing continuous SW, say a feedback control system, as (a) a rapid (hundreds of 
Hz) polling/sampling routine, which then calls (b) an on-demand routine; and then (c) considering 
(a) as a very rapid Bernoulli process. That gives me the desired result, but it is wrong, because 
you can't consider (a) to be a Bernoulli process because it's not memoryless: in order for example 
to calculate rates of change, your sampler needs a history of recent values (whether accumulated in 
a history variable or not). And if it is retaining history then it ain't memoryless, so it can't be 
Bernoulli: (c) is wrong.

M has extensive assessment experience. 61508 orients itself towards components, analysing the 
components, and putting components together. There is a question whether this continues to be 
appropriate; many larger engineering projects are adopting the "systems engineering" approach, 
detailed in the INCOSE Handbook, which starts with top-level overall requirements and purpose and 
derives detailed design essentially through refinement steps, and the analyses parallel the 
refinement. People such as myself would argue that fundamental safety assessments such as Hazard 
Analysis benefit from that -- but then I have a Hazan method, Ontological Hazard Analysis, which 
proceeds by refinement so of course I would think that.

But the thing about assessing to 61508 is that you are focusing on components. I think this is also 
true of ISO 26262. If you are focusing on components then you are looking at putting simple stuff 
together to obtain more complicated functionality. So you are generally looking at software of 
limited functionality, or very limited functionality, which is meant to run the component, the 
"simple stuff". And if most of what you do is looking at that, then there is a very practical sense 
in which the software you are inspecting can be considered to be deterministic. If you are looking 
at a robust, protected thermometer then you are not encountering race conditions. If you have a 
slow, time-triggered network going to your data integrator, then you are not looking at race 
conditions either -- you are probably looking more at data integrity checks. That's where M is 
coming from. It is not so much that this kind of SW is deterministic as that you can be and some 
will argue should be treating it as deterministic for the purposes of assessment.

PBL

Prof. Dr. Peter Bernard Ladkin
Causalis Limited/Causalis IngenieurGmbH, Bielefeld, Germany
Tel: +49 (0)521 3 29 31 00



More information about the systemsafety mailing list