[SystemSafety] Difference between software reliability and astrology

Tue Aug 20 20:40:32 CEST 2024

On 2024-08-14 12:09, Prof. Dr. Peter Bernard Ladkin wrote:
>>>> Which statistical processes are generally recognised to model 
>>>> software behaviour?
>>> 
>>> We are talking safety-critical software.
>>> 
>>> The answer is memoryless processes. "Memoryless" here is a technical 
>>> term of statistics. https://mathworld.wolfram.com/Memoryless.html  It 
>>> should not be thought to have much if anything to do with "computer 
>>> memory". Some of us have had years-long "discussions" with engineers 
>>> who were convinced it must have somehow to do with it, somehow.
>> 
>> I had not heard of "memoryless" before. Is there a rationale for 
>> considering software/system failure probabilities as exponential, or 
>> geometric?
> 
> Yes, It follows directly from the mathematics of memorylessness, as the 
> Wolfram Mathworld article I cited shows. If you find that difficult to 
> get into, you might try Kyle Siegrist's on-line text. Bev put me on to 
> it a decade and a half ago and I've been citing it since.

I've referred to your 2016 paper "Practical Statistical Evaluation of 
Critical Software", and also "Some Practical Issues in Statistically 
Evaluating Critical Software" from 2015, which have helped me to 
understand the approach.

These memoryless distributions seem chosen to model relatively simple 
software - not really applicable for non-binary behaviours, or software 
that maintains state about what is going on, or software that gets 
updated frequently, or runs on multicore machines, or where the failure 
rate is not constant, etc.

As you said in 2015

"We conclude that establishing the reliability of RTOS practically using 
the Bernoulli/Poisson mathematics in this manner looks close to 
infeasible. Yet Annex D currently states in its second sentence “This 
approach is considered particularly appropriate as part of the 
qualification of operating systems, [etc.]” !

It seems to me that for complex software in general, we'll need 
something better?

> https://www.randomservices.org/random/index.html  Especially, in this 
> case, Chapters 10, 13 and 14.
> 
>> [PS] From previous discussions here and elsewhere (and prior 
>> consideration of ISO26262) I was under the impression that software is 
>> 'generally recognised' to be (ideally) deterministic?
>>> 
>>> [PBL] Did you read the note I forwarded?
>> 
>> I confess I was put off when I saw the opening statement "Software 
>> execution is deterministic; therefore it is not stochastic;..." etc.
> 
> That didn't come from me. It came from an Australian critical-systems 
> assessor, M, commenting on the new proposed version of 61508-7 Annex D. 
> He doubted that on-demand functions can be said to have an average 
> probability of failure on demand (PFD_[avg]); and he doubted that 
> continuously-operating functions can be said to have a Mean Time To 
> Failure (MTTF). Those quantities,  PFD_[avg] and MTTF, are what are 
> known as statistical parameters, namely their meaningful existence 
> follows from the set-up. I reconstructed PFD_[avg] from the behaviour 
> of an on-demand function for a given operational profile, so he was 
> wrong about that. But I haven't yet managed to reconstruct MTTF for 
> continuous software.
> 
> I did try deconstructing continuous SW, say a feedback control system, 
> as (a) a rapid (hundreds of Hz) polling/sampling routine, which then 
> calls (b) an on-demand routine; and then (c) considering (a) as a very 
> rapid Bernoulli process. That gives me the desired result, but it is 
> wrong, because you can't consider (a) to be a Bernoulli process because 
> it's not memoryless: in order for example to calculate rates of change, 
> your sampler needs a history of recent values (whether accumulated in a 
> history variable or not). And if it is retaining history then it ain't 
> memoryless, so it can't be Bernoulli: (c) is wrong.
> 
> M has extensive assessment experience. 61508 orients itself towards 
> components, analysing the components, and putting components together. 
> There is a question whether this continues to be appropriate; many 
> larger engineering projects are adopting the "systems engineering" 
> approach, detailed in the INCOSE Handbook, which starts with top-level 
> overall requirements and purpose and derives detailed design 
> essentially through refinement steps, and the analyses parallel the 
> refinement. People such as myself would argue that fundamental safety 
> assessments such as Hazard Analysis benefit from that -- but then I 
> have a Hazan method, Ontological Hazard Analysis, which proceeds by 
> refinement so of course I would think that.
> 
> But the thing about assessing to 61508 is that you are focusing on 
> components. I think this is also true of ISO 26262. If you are focusing 
> on components then you are looking at putting simple stuff together to 
> obtain more complicated functionality. So you are generally looking at 
> software of limited functionality, or very limited functionality, which 
> is meant to run the component, the "simple stuff". And if most of what 
> you do is looking at that, then there is a very practical sense in 
> which the software you are inspecting can be considered to be 
> deterministic. If you are looking at a robust, protected thermometer 
> then you are not encountering race conditions. If you have a slow, 
> time-triggered network going to your data integrator, then you are not 
> looking at race conditions either -- you are probably looking more at 
> data integrity checks. That's where M is coming from. It is not so much 
> that this kind of SW is deterministic as that you can be and some will 
> argue should be treating it as deterministic for the purposes of 
> assessment.

Your explanations have been very helpful, Peter, thank you.

So in summary if I understand correctly

- the standards have been oriented towards simpler software (with 
justification, because complexity makes safety more difficult), and 
simple software (particularly software designed for safety-critical use 
running on simple hardware) can be considered practically deterministic.

- for more complex software the Bernoulli/Poisson model may be 
applicable in some cases, but not generally.

br
Paul