[SystemSafety] Difference between software reliability and astrology
Paul Sherwood
paul.sherwood at codethink.co.uk
Wed Aug 21 18:26:21 CEST 2024
On 2024-08-21 16:28, Prof. Dr. Peter Bernard Ladkin wrote:
>>> First, you are talking about using an operating system. An operating
>>> system is a continuously-running system, not a discrete on-demand
>>> function which returns an output value.
>>
>> Hmmm. Let's break that apart...
>
> Let's not. For the purposes of assessing reliability, it's not that
> relevant.
It's relevant for us - we are trying to distinguish between software
reliability and system reliability.
>>> So its failure behaviour is not a Bernoulli process. You can drop the
>>> "Bernoulli" bit.
>>
>> From a physical perspective, the behaviour of such a constructed
>> system appears continuous, but considering what the OS itself is
>> actually doing, every action is discrete.
>
> So what? Suppose you have a sensor sampling at 400 Hz (typical for
> aircraft-dynamics sensors, for example). The piece of SW dealing with
> those readings (aka control system) is going to want to ascertain rates
> of change and other stuff, so it needs to keep a history of readings
> (over a short period of time). If you have history variables then you
> aren't memoryless. If you're not memoryless then you aren't a Bernoulli
> process, discrete or not.
I already agreed with you - I don't believe complex software behaviour
can be considered as memoryless process.
>>> But keep in mind you can't be letting [the OS] fail. For SIL 4 safety
>>> functions, it has to be running more than 100 million operating hours
>>> between failures on average. That is the constraint from 61508-1
>>> Table 3, which is independent of any means of describing the failure
>>> behaviour.
>>
>> Understood, but I wonder a bit about the numbers in the table. Can you
>> (or anyone on the list) help me understand how the committee arrived
>> at 10^-5, 10^-6, 10^-7, 10^-8 as targets?
>
> (1) There is no theoretical reason why powers of 10 are chosen.
>
> (2) They come from the aerospace regulations, and the "accepted means
> of compliance". The regs contain certain powers of ten for "hazardous
> condition" and "catastrophic condition" and sometimes other hazard
> classes ("minor" and "major") and the AMC nowadays interprets phrases
> such as "not expected to occur within the lifetime of the aircraft
> [fleet]" into probabilities expressed in powers of ten. The reason is
> likely that civil air transport was having continual and improving
> success with what in effect turns out to be its risk matrix, for half a
> century before 61508 came along.
Super, Peter - thanks again
br
Paul
More information about the systemsafety
mailing list