[SystemSafety] Software reliability (or whatever you would prefer to call it)

Wed Mar 11 08:50:41 CET 2015

Dear Mr. Ladkin,

Le 10/03/2015 12:16, Peter Bernard Ladkin a écrit :
> The standard resolution of questions as to whether math is correctly or incorrectly used is to
> provide a proof or a counterexample.

I'll try a counter-example ;-) or more exactly I would like to see your 
reasoning on software reliability and probabilities on the following 
example.

Consider reuse of a Inertial Reference System module from Ariane 4 to 
Ariane 501.

In Ariane 4 this module was working flawlessly. There was a track record 
for its use, i.e. several successful flights (and I assume numerous 
tests before those flights).

In Ariane 501 this module was reused, I assume mostly unmodified, 
because is worked very well in Ariane 4. However on 501 flight an 
out-of-expected-range input value (horizontal velocity) triggered an 
out-of-range exception in a float to integer conversion routine. This 
exception resulted in a diagnostic pattern appearing on a bus and 
misinterpreted by another software module that leaded to Ariane 501 
incorrect trajectory and thus explosion.

(All details in: http://esamultimedia.esa.int/docs/esa-x-1819eng.pdf)

For me, until Ariane 501, the module could be characterized a pretty 
reliable with a high probability.

Then, from Ariane 501, the same module exhibited a very low reliability 
due to a systematic software failure in the new environment. The 
systematic failure was latent already in Ariane 4, but never triggered 
because the horizontal velocity was in expected range. Moreover, the 
overall software architecture also had a potential failure because a 
diagnostic pattern could be interpreted as a value. Once again, in 
Ariane 4 such scenario apparently did not occur.

Would your proposal on Annex D address this case? Under which conditions 
a software module can be assessed reliable and reused?

Sincerely yours,
D. Mentré