[SystemSafety] Comparing reliability predictions with reality

Prof. Dr. Peter Bernard Ladkin ladkin at causalis.com
Mon Feb 24 19:38:32 CET 2025


On 2025-02-24 18:00 , Robert P Schaefer wrote:
> My claim is, this is a very difficult business to get right safely, consistently, and in such a manner that those who come after can learn from those who came before.

I agree with that.

But some have done it.

> Not so much standing on the shoulders of giants.

Absolutely standing on the shoulders of giants.

Example. Jim Gray won the Turing Award in 1998 for his work on distributed database transactions. 
But he didn't solve the distributed serialisation problem under unreliable communications - 3 phase 
commit was still standard until Leslie Lamport solved the problem in the 1989 with the first Paxos 
algorithm. The entire WWW, from Amazon onwards, is now dependent on such algorithms for its 
functioning. Leslie won the Turing Award, amongst other things for his work on Paxos, in 2013. 
Leslie's Paxos algorithms have all been proved correct through use of TLA+.

You might want to say that this is not the same thing as safety software, say in civil aerospace. 
But it is. When DAL A software fails, the evidence is strewn over the landscape in parts (or sunk in 
the ocean). There is, in Derek's sense, publicly available evidence of failure. So, tell me, how 
many such occasions have there been in the last twenty years when DAL A software has failed?

PBL

Prof. Dr. Peter Bernard Ladkin
Causalis Limited/Causalis IngenieurGmbH, Bielefeld, Germany
Tel: +49 (0)521 3 29 31 00



More information about the systemsafety mailing list