[SystemSafety] Difference between software reliability and astrology

Prof. Dr. Peter Bernard Ladkin ladkin at causalis.com
Thu Aug 15 09:31:41 CEST 2024


On 2024-08-14 23:26 , Derek M Jones wrote:
> All,
>
>> For the current readership, perhaps you'd care to restate the arguments that you claim "debunk" 
>> the observations therein?
>
> Let's start with the use of a Bernoulli process in the
> analysis of fault experiences.
>
> A Bernoulli process involves an event that occurs with
> some fixed probability, p.
> The probability of this event not occurring is q=(1-p)
> https://en.wikipedia.org/wiki/Bernoulli_distribution
>
> A fault is experienced when some combination of input to
> a program is combined with one or more mistakes in the code.

Dear me. First of all, standard terminology. You don't experience an event called a "fault", you 
experience an event called a *failure*. "Fault" is generally taken to be the causal origin of the 
failure. In this simple case, the fault would usually be taken to be the errors in the code, and 
most software people prefer the term "software error" to "software fault". Second, you don't specify 
the nature of the output. You can have inputs proceeding through "mistakes" in the code and still 
yielding correct results. Third, you presume there are phenomena in the code called "mistakes". 
Likely there are, but (assuming you are talking about a failed function execution) you cannot 
necessarily locate a failed execution to some thing which you can call a "mistake".

Locating failed executions to "mistakes" in the code means you have to reify "mistakes" and I doubt 
you can do that. Maybe the particular combination of inputs was not covered by the code 
requirements. There's no "mistake" in the code; you just get the wrong answer. And what about 
Byzantine Failures? There are not necessarily any "mistakes" in any of the code running on any of 
the nodes; even if there are, they don't contribute to the phenomenon.

> If we take one particular coding mistake, there can be multiple
> sets of inputs that produce the fault experience, and these
> multiple sets of inputs occur with various probabilities.

How do you assign "probabilities" which you wish to associate with input sets?

Indeed, what is an "input set"? Is it just a vector of individual values? If so, why not just call 
it "input value"?

> The probability of experiencing a fault is the sum of the
> probabilities of these various fault inducing input sets.
> This distribution is known as a Poisson binomial distribution
> https://en.wikipedia.org/wiki/Poisson_binomial_distribution

That's another reason why focusing attention on "mistakes" in code, and trying to quantify 
failure/success per "mistake" is not a useful approach. Even assuming inputs have "probabilities", 
it seems you model with a distribution which is less easy to handle than the binomial. There is of 
course no need to do that, as "Software, the Urn Model and Failure" shows. Why make things harder?

> Now the urn model
>
> The urn model, or Polya urn model to give it its full name, 

The urn model comes from Bernoulli's 1713 Ars Conjectandi.

> involves an urn containing some number of balls of various colors.
> A ball is drawn, its color noted, and that ball along with
> a ball having the same color are returned to the urn.
>
> In the urn model, drawing, say, a black ball increases
> the probability of a black ball being drawn later (because
> the first draw causes an extra black ball to be added to
> the urn).
> https://en.wikipedia.org/wiki/P%C3%B3lya_urn_model
Yes, and? Lots of people here know the urn model. What do you want to say about it?

PBL

Prof. Dr. Peter Bernard Ladkin
Causalis Limited/Causalis IngenieurGmbH, Bielefeld, Germany
Tel: +49 (0)521 3 29 31 00



More information about the systemsafety mailing list