[SystemSafety] Comparing reliability predictions with reality

Mon Feb 24 18:12:24 CET 2025

Bev,

> Perhaps you have not been looking in the right places, Derek!

Perhaps I need to go back and reread your papers,
rather than thinking, oh yes, already seen that lots of times.
One of my favorites is:
"Theories of Software Reliability: How Good Are They
and How Can They Be Improved?"
which clearly lays out the problem that needs to be
solved and the issues around it.

> The easiest read for this stuff is in Michael Lyu’s old book (old, but still one of the best introductory accounts of this kind of material I believe):

It's available here
https://www.cse.cuhk.edu.hk/~lyu/book/reliability/

> ‘Techniques for prediction analysis and recalibration’ (with S Brocklehurst), Chapter 4 of The Handbook of Software Reliability Engineering (Ed. Michael Lyu), McGraw-Hill, New York, 1995, pp 119-166. (I think it is still available for free download)

Figures 4.2 and 4.3 shows what I am looking for, and
suggest that fault discovery for this program has not
yet settled down enough for predictions to have any degree
of accuracy.
https://www.cse.cuhk.edu.hk/~lyu/book/reliability/pdf/Chap_4.pdf

> ‘Recalibrating software reliability models’, (with S Brocklehurst, P Y Chan, J Snell), IEEE Trans Software Engineering, Vol 16, No 4, pp 458-470, April 1990.

I recently discovered that I did not have part II of Brocklehurst's
thesis, the bit that contains the data!

> Very briefly, there are two main tools: u-plots, and prequential likelihood comparisons. The first of them is a kind of ‘absolute’ assessment of a single model's accuracy, based on a sequence of (prediction, outcome)-pair comparisons. It turns out that you can use this - somewhat surprisingly - to improve a model’s predictions, essentially by allowing it to learn from its past ‘errors’. The second is a means of comparing competing models to select the best (for a particular data set - i.e. sequence of inter-failure times).

I have read a bit about the u-plot idea.  The main issue
I have with it is that larger datasets are needed to validate
it.  It's an idea in waiting, like most of software reliability.

> I have not worked on reliability growth models for many years. It became a cottage industry producing new models with small tweaks. In fact you will still find papers on it in, e.g. IEEE Trans Reliability. But it long ago seemed to me that only very minor benefits in model accuracy were being obtained. And the techniques I have described above could often be used to check accuracy by applying several of the existing models.

Your paper
"Conceptual Modeling of Coincident Failures in Multiversion Software"
cites Nagel's work (via your coauthor, I assume).

Do you have any thoughts on why Nagel & coworkers approach never
took off?
That is, have people to write programs and then run multiple sets
of tests on them; rather like fuzzing today.

> I agree with you, though, that there continues to be a dearth of published data.

I'm optimistic that fuzzing data can help fill the void.
The main problem I have at the moment is that when I email
authors for data, they have not recorded much.  They are
primarily interested in improving fuzzing, not researching
reliability.

-- 
Derek M. Jones           Evidence-based software engineering
blog:https://shape-of-code.com