[SystemSafety] Software reliability (or whatever you would prefer to call it)

Nick Tudor njt at tudorassoc.com
Tue Mar 10 12:17:40 CET 2015


Hi Bev - re definitions:

I have been contemplating this overnight and have come to the same
conclusion as Michael Holloway who put it far more succinctly than I could.

Hope that helps

Nick Tudor
Tudor Associates Ltd
Mobile: +44(0)7412 074654
www.tudorassoc.com

*77 Barnards Green Road*
*Malvern*
*Worcestershire*
*WR14 3LR*
*Company No. 07642673*
*VAT No:116495996*

*www.aeronautique-associates.com <http://www.aeronautique-associates.com>*

On 9 March 2015 at 18:19, Littlewood, Bev <Bev.Littlewood.1 at city.ac.uk>
wrote:

>  Hi Nick
>
>  The distance between us seems to be diminishing...
>
>  On 9 Mar 2015, at 17:52, Nick Tudor <njt at tudorassoc.com> wrote:
>
> Hi Bev
>
>  My objection still stands and yes, you have parsed incorrectly. Slowly
> is right and yes we do agree that the environment is potentially,
> likely, almost certainly random. It's just that small, simple detail that
> the software is not and hence does not have a "reliability ". If you could
> agree that, then maybe we are getting somewhere.
>
>
>  So how about “the reliability of software in its environment”. Or “the
> reliability of a (software, environment) pair”?
>
>  Such terminology may be a bit clumsy for everyday use, but captures what
> I understand. How about you?
>
>
>  I have no objection to conservatism in safety systems, software based or
> otherwise, but I do object to bad advice which forces conservatism to be
> unsoundly justified and hence too costly. I have read the document to
> which you refer, among others from the ONR. I know this, and others, have
> been based upon interested parties views and unfortunately have become lore
> if not 'law'. In this instance, just because it's in the document
> doesn't make it right, just unjustifiably conservative.
>
>
>  Your first sentence sounds like rhetoric, rather than something you
> could support rationally in terms of quantified risks of design basis
> accidents. Again, this is an area where serious thought has gone into the
> engineering requirements (how reliable? how safe?). It would be
> irresponsible to dismiss the implications for system builders with charges
> of “too costly” without some equally serious analysis. I suspect you have
> neither the means nor the inclination to do that. As it is, it seems like a
> bit of Thatcherite, anti-regulation rhetoric!
>
>  Cheers
>
>  Bev
>
>
>
>
> On Monday, 9 March 2015, Littlewood, Bev <Bev.Littlewood.1 at city.ac.uk>
> wrote:
>
>> Hi Nick
>>
>>  On 9 Mar 2015, at 10:14, Nick Tudor <njt at tudorassoc.com> wrote:
>>
>>  Hi Bev
>>
>>  The input you have given to support Peter is the same that you have
>> been [wrongly] saying for over 30 years.
>>
>>
>>  40-odd years in fact: the first paper was in 1973. And all that time
>> under the rigorous scrutiny of scientific peer review…:-)
>>
>>  But this exchange makes it seem like yesterday.
>>
>>  I just read your latest posting in which you say “...it is the
>> environment that is random rather than the software.”  Exactly. In fact, as
>> I put it in my posting:
>>
>>    The main source of uncertainty lies in software’s interaction with
>>> the world outside. There is inherent uncertainty about the inputs it will
>>> receive in the future, and in particular about when it will receive an
>>> input that will cause it to fail.
>>>
>>   So the software encounters faults randomly, so the software fails
>> randomly, so the failure process is random, i.e. *it is a stochastic
>> process*. We are getting there, slowly. Can I take it that you withdraw
>> your objections and we are now in agreement?
>>
>>  Simple, really, isn’t it?
>>
>>  Your comment about the UK nuclear sector is puzzling. I and my
>> colleagues have worked with them (regulators and licensees) for 20 years
>> (and still do). I have nothing but admiration for the technical competence
>> and sense of responsibility of the engineers involved. If by “holding back”
>> the sector, you are referring to their rather admirable technical
>> conservatism, when building critical computer-based systems, I can only
>> disagree with you. But it is probably their insistence on assessing the
>> reliability of their critical software-based systems that prompts your ire.
>> You are wrong, of course: read “The tolerability of risk from nuclear power
>> stations” (www.onr.org.uk/documents/*tolerability*.pdf
>> <http://www.onr.org.uk/documents/tolerability.pdf>)
>>
>>  Cheers
>>
>>  Bev
>>
>>  PS I was rather amused to see the following on your website: "Tudor
>> Associates is a consultancy that specialises in assisting companies for
>> whom safe, reliable systems and software are critical.” Am I parsing that
>> wrong? Don’t the adjectives apply to software? Or have you changed your
>> mind?
>>
>>
>>
>>   The best example of this is "Execution of software is thus a
>> *stochastic* (random) *process*".  No it isn't and you said so in the
>> earlier part of the section :"It is true, of course, that software fails
>> systematically, in the sense that if a program fails in certain
>> circumstances, it will *always* fail when those circumstances are
>> exactly repeated".
>>
>>  So it either works [in the context of use] or it doesn't.
>>
>>  Coming up with some apparent pattern of behaviour which you claim is
>> random because of an indeterminate world, does not make the software
>> execution in any way random.  It merely acknowledges that the world is a
>> messy place, which we all knew anyway.  Your MS example, is yet another
>> attempt to justify the approach.  MS had so many issues because there were
>> an indeterminate number of 3rd party applications which over wrote MS .dll
>> causing unforeseen effects.  Well, shock ! It apparently failed randomly;
>> doesn't support your argument in anyway what so ever.
>>
>>  I too have a day job and hence I cannot pick apart all of your
>> arguments, but could easily do so (and have done in previous posts).  So
>> I'll cut to the chase.
>>
>>  In my view, the reason so many have commented on the list is that the
>> kind of thinking espoused regarding so called "software reliability" costs
>> industry and tax payers money and it is frustrating to have such written in
>> standards which ill-informed users, such as those in government, take as
>> read.  This kind of thinking has and continues to hold back the UK nuclear
>> sector, for example, and, as I wrote in an earlier posting, I would rather
>> the whole subject was removed entirely from the standard. If it is not
>> possible to remove it entirely (which should be possible), then there
>> should be a very clearly written disclaimer which emphasises that not
>> everyone believes that the approach is viable and that it is left to the
>> developer to propose the manner in which software can be shown to be
>> acceptably safe without having to use "software reliability" as a method to
>> justify the contribution of software to system safety.
>>
>>  Going back [again] to the day job
>>
>>  Regards
>>
>>
>>  Nick Tudor
>> Tudor Associates Ltd
>> Mobile: +44(0)7412 074654
>> www.tudorassoc.com
>>
>>  *77 Barnards Green Road*
>> *Malvern*
>> *Worcestershire*
>> *WR14 3LR*
>> * Company No. 07642673*
>> *VAT No:116495996*
>>
>>  *www.aeronautique-associates.com
>> <http://www.aeronautique-associates.com/>*
>>
>> On 8 March 2015 at 14:03, Littlewood, Bev <Bev.Littlewood.1 at city.ac.uk>
>> wrote:
>>
>>> As I am the other half of the authorial duo that has prompted this
>>> tsunami of postings on our list, my friends may be wondering why I’ve kept
>>> my head down. Rather mundane reason, actually - I’ve been snowed under with
>>> things happening in my day job (and I’m supposed to be retired…).
>>>
>>>  So I’d like to apologise to my friend and co-author of the offending
>>> paper, Peter Ladkin, for leaving him to face all this stuff alone. And I
>>> would like to express my admiration for his tenacity and patience in
>>> dealing with it over the last few days. I hope others on this list
>>> appreciate it too!
>>>
>>>  I can’t respond here to everything that has been said, but I would
>>> like to put a few things straight.
>>>
>>>  First of all, the paper in question was not intended to be at all
>>> controversial - and indeed I don’t think it is. It has a simple purpose: to
>>> clean up the currently messy and incoherent Annex D of 61508. Our aim here
>>> was not to innovate in any way, but to take the premises of the original
>>> annex, and make clear the assumptions underlying the (very simple)
>>> mathematics/statistics for any practitioners who wished to use it. The
>>> technical content of the annex, such as it is, concerns very simple
>>> Bernoulli and Poisson process models for (respectively) on-demand (discrete
>>> time) and continuous time software-based systems. Our  paper addresses the
>>> practical concerns that a potential user of the annex needs to address - in
>>> order, for example, to use the tables there. Thus there is an extensive
>>> discussion of the issue of state, and how this affects the plausibility of
>>> the necessary assumptions needed to justify claims for Bernoulli or Poisson
>>> behaviour.
>>>
>>>  Note that there is no advocacy here. We do not say “Systems
>>> necessarily fail in Bernoulli/Poisson processes, so you must assess their
>>> reliability in this way”. Whilst these are, we think, plausible models for
>>> many systems, they are clearly not applicable to all systems. Our concern
>>> was to set down what conditions a user would need to assure in order to
>>> justify the use of the results of the annex. If his system did not satisfy
>>> these requirements, then so be it.
>>>
>>>  So why has our innocuous little offering generated so much steam?
>>>
>>>  Search me. But reading some of the postings took me back forty years.
>>> “There’s no such thing as software reliability.” "Software is deterministic
>>> (or its failures are systematic) therefore probabilistic treatments are
>>> inappropriate.” Even, God help us, “Software does not fail.” (Do these
>>> people not use MS products?) “Don’t bother me with the science, I’m an
>>> *engineer* and I know what’s what” (is that an unfair caricature of a
>>> couple of the postings?). “A lot of this stuff came from academics, and we
>>> know how useless and out-of-touch with the real world they are (scientific
>>> peer-review? do me a favour - just academics talking to one another)”. Sigh.
>>>
>>>  Here are a few comments on a couple of the topics of recent
>>> discussions. Some of you may wish to stop reading here!
>>>
>>>  *1 Deterministic, systematic…and stochastic. *
>>>
>>>  Here is some text I first used thirty years ago (only slightly
>>> modified). This is not the first time I’ve had to reuse it in the
>>> intervening years.
>>>
>>> "It used to be said – in fact sometimes still is – that 'software
>>> failures are systematic *and therefore it does not make sense to talk
>>> of software reliability'*. It is true, of course, that software fails
>>> systematically, in the sense that if a program fails in certain
>>> circumstances, it will *always* fail when those circumstances are
>>> exactly repeated. Where then, it is asked, lies the uncertainty that
>>> requires the use of probabilistic measures of reliability?
>>>
>>> "The main source of uncertainty lies in software’s interaction with the
>>> world outside. There is inherent uncertainty about the inputs it will
>>> receive in the future, and in particular about when it will receive an
>>> input that will cause it to fail. Execution of software is thus a *
>>> stochastic* (random) *process*. It follows that many of the classic
>>> measures of reliability that have been used for decades in hardware
>>> reliability are also appropriate for software: examples include *failure
>>> rate* (for continuously operating systems, such as reactor control
>>> systems); *probability of failure on demand (pfd)* (for demand-based
>>> systems, such as reactor protection systems); *mean time to failure*;
>>> and so on.
>>> "This commonality of measures of reliability between software and
>>> hardware is important, since practical interest will centre upon the
>>> reliability of *systems* comprising both. However, the mechanism of
>>> failure of software differs from that of hardware, and we need to
>>> understand this in order to carry out reliability evaluation.”  (it
>>> goes on to discuss this - no room to do it here)
>>>
>>>  At the risk of being repetitive: The point here is that uncertainty -
>>> "aleatory uncertainty" in the jargon - is an inevitable property of the
>>> failure process. You cannot eliminate such uncertainty (although you may be
>>> able to reduce it). The only candidate for a quantitative calculus of
>>> uncertainty is probability. Thus the failure process is a stochastic
>>> process.
>>>
>>>  Similar comments to the above can be made about “deterministic” as
>>> used in the postings. Whilst this is, of course, an important and useful
>>> concept, it has nothing to do with this particular discourse.
>>>
>>>  *2. Terminology, etc.*
>>>
>>>  Serious people have thought long and hard about this. The
>>> Avizienis-Laprie-Randell-Neumann paper is the result of this thinking. You
>>> may not agree with it (I have a few problems myself), but it cannot be
>>> dismissed after a few moments thought, as it seems to have been in a couple
>>> of postings. If you have problems with it, you need to engage in serious
>>> debate. It’s called science.
>>>
>>>  *3. You can’t measure it, etc.*
>>>
>>>  Of course you can. Annex D of 61508, in its inept way, shows how - in
>>> those special circumstances that our note addresses in some detail.
>>>
>>>  Society asks “How reliable?”, “How safe?”, “Is it safe enough?”, even
>>> “How confident are you (and should we be) in your claims?” The first three
>>> are claims about the stochastic processes of failures. If you don’t accept
>>> that, how else would you answer? I might accept that you are a good
>>> engineer, working for a good company, using best practices of all kinds -
>>> but I still would not have answers to the first three questions.
>>>
>>>  The last question above raises the interesting issue of epistemic
>>> uncertainty about claims for systems. No space to discuss that here - but
>>> members of the list will have seen Martyn Thomas’ numerous questions about
>>> how confidence will be handled (and his rightful insistence that it
>>> *must* be handled).
>>>
>>>  *4. But I’ll never be able to claim 10^-9….*
>>>
>>>  That’s probably true.
>>>
>>>  Whether 10^-9 (probability of failure per hour) is actually * needed*
>>> in aerospace is endlessly debated. But you clearly need *some* dramatic
>>> number. Years ago, talking to Mike deWalt about these things, he said that
>>> the important point was that aircraft safety needed to improve
>>> continuously. Otherwise, with the growth of traffic, we would see more and
>>> more frequent accidents, and this would be socially unacceptable. The
>>> current generation of airplanes are impressively safe, so new ones face a
>>> very high hurdle. Boeing annually provide a fascinating summary of detailed
>>> statistics on world-wide airplane safety (www.*boeing*
>>> .com/news/techissues/pdf/*statsum*.pdf). From this you can infer that
>>> current critical computer systems have demonstrated, in hundreds of
>>> millions of hours of operation, something like 10^-8 pfh (e.g. for the
>>> Airbus A320 and its ilk). To satisfy Mike’s criterion, new systems need to
>>> demonstrate that they are better than this. This needs to be done *before
>>> *they are certified. Can it?
>>>
>>>  Probably not. See Butler and Finelli (IEEE Trans Software Engineering,
>>> 1993), or Littlewood and Strigini (Comm ACM, 1993) for details.
>>>
>>>  Michael Holloway’s quotes from 178B and 178C address this issue, and
>>> have always intrigued me. The key phrase is "...*currently available
>>> methods do not provide results in which confidence can be placed at the
>>> level required for this purpose**…*” Um. This could be taken to
>>> mean: “Yes, we could measure it, but for reasons of practical feasibility,
>>> we know the results would fall far short of what’s needed (say 10^-8ish).
>>> So we are not going to do it.” This feels a little uncomfortable to me.
>>> Perhaps best not to fly on a new aircraft type until it has got a few
>>> million failure-free hours under its belt (as I have heard a regulator say).
>>>
>>>  By the way, my comments here are not meant to be critical of the
>>> industry’s safety achievements, which I think are hugely impressive (see
>>> the Boeing statsum data).
>>>
>>>  *5. Engineers, scientists…academics...and statisticians...*
>>>
>>>  …a descending hierarchy of intellectual respectability?
>>>
>>>  With very great effort I’m going to resist jokes about alpha-male
>>> engineers. But I did think Michael’s dig at academics was a bit below the
>>> belt. Not to mention a couple of postings that appear to question the
>>> relevance of science to engineering. Sure, science varies in quality and
>>> relevance. As do academics. But if you are engineering critical systems it
>>> seems to me you have a responsibility to be aware of, and to use, the best
>>> relevant science. Even if it comes from academics. Even if it is
>>> statistical.
>>>
>>>
>>>  My apologies for the length of this. A tentative excuse: if I’d spread
>>> it over several postings, it might have been even longer…
>>>
>>>  Cheers
>>>
>>>  Bev
>>> _______________________________________________
>>>
>>> Bev Littlewood
>>> Professor of Software Engineering
>>> Centre for Software Reliability
>>> City University London EC1V 0HB
>>>
>>> Phone: +44 (0)20 7040 8420  Fax: +44 (0)20 7040 8585
>>>
>>> Email: b.littlewood at csr.city.ac.uk
>>>
>>> http://www.csr.city.ac.uk/
>>> _______________________________________________
>>>
>>>
>>> _______________________________________________
>>> The System Safety Mailing List
>>> systemsafety at TechFak.Uni-Bielefeld.DE
>>>
>>>
>>
>> _______________________________________________
>>
>> Bev Littlewood
>> Professor of Software Engineering
>> Centre for Software Reliability
>> City University London EC1V 0HB
>>
>> Phone: +44 (0)20 7040 8420  Fax: +44 (0)20 7040 8585
>>
>> Email: b.littlewood at csr.city.ac.uk
>>
>> http://www.csr.city.ac.uk/
>> _______________________________________________
>>
>>
>
> --
> Nick Tudor
> Tudor Associates Ltd
> Mobile: +44(0)7412 074654
> www.tudorassoc.com
>
>  *77 Barnards Green Road*
> *Malvern*
> *Worcestershire*
> *WR14 3LR*
> * Company No. 07642673*
> *VAT No:116495996*
>
>  *www.aeronautique-associates.com
> <http://www.aeronautique-associates.com/>*
>
>
> _______________________________________________
>
> Bev Littlewood
> Professor of Software Engineering
> Centre for Software Reliability
> City University London EC1V 0HB
>
> Phone: +44 (0)20 7040 8420  Fax: +44 (0)20 7040 8585
>
> Email: b.littlewood at csr.city.ac.uk
>
> http://www.csr.city.ac.uk/
> _______________________________________________
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20150310/4d6823f6/attachment-0001.html>


More information about the systemsafety mailing list