[SystemSafety] Difference between software reliability and astrology

Prof. Dr. Peter Bernard Ladkin ladkin at techfak.de
Wed Aug 21 17:28:55 CEST 2024


On 2024-08-21 15:48 , Paul Sherwood wrote:
> On 2024-08-21 12:08, Prof. Dr. Peter Bernard Ladkin wrote:
>> First, you are talking about using an operating system. An operating system is a 
>> continuously-running system, not a discrete on-demand function which returns an output value.
>
> Hmmm. Let's break that apart...

Let's not. For the purposes of assessing reliability, it's not that relevant.

>> So its failure behaviour is not a Bernoulli process. You can drop the "Bernoulli" bit.
>
> From a physical perspective, the behaviour of such a constructed system appears continuous, but 
> considering what the OS itself is actually doing, every action is discrete. 

So what? Suppose you have a sensor sampling at 400 Hz (typical for aircraft-dynamics sensors, for 
example). The piece of SW dealing with those readings (aka control system) is going to want to 
ascertain rates of change and other stuff, so it needs to keep a history of readings (over a short 
period of time). If you have history variables then you aren't memoryless. If you're not memoryless 
then you aren't a Bernoulli process, discrete or not.

>> But keep in mind you can't be letting [the OS] fail. For SIL 4 safety functions, it has to be 
>> running more than 100 million operating hours between failures on average. That is the constraint 
>> from 61508-1 Table 3, which is independent of any means of describing the failure behaviour.
>
> Understood, but I wonder a bit about the numbers in the table. Can you (or anyone on the list) 
> help me understand how the committee arrived at 10^-5, 10^-6, 10^-7, 10^-8 as targets?

(1) There is no theoretical reason why powers of 10 are chosen.

(2) They come from the aerospace regulations, and the "accepted means of compliance". The regs 
contain certain powers of ten for "hazardous condition" and "catastrophic condition" and sometimes 
other hazard classes ("minor" and "major") and the AMC nowadays interprets phrases such as "not 
expected to occur within the lifetime of the aircraft [fleet]" into probabilities expressed in 
powers of ten. The reason is likely that civil air transport was having continual and improving 
success with what in effect turns out to be its risk matrix, for half a century before 61508 came 
along.

PBL

Prof. i.R. Dr. Peter Bernard Ladkin, Bielefeld, Germany
www.rvs-bi.de






More information about the systemsafety mailing list