[SystemSafety] What do we know about software reliability?

Tue Sep 15 15:52:38 CEST 2020

Michael,

>> I don't see any overall benefit in trying to define a new word.
> 
> It is not possible for me to disagree with this sentiment more than I do.
> There is *always* a benefit in replacing a bad word choice with a better
> one. Trying to mitigate a bad choice by providing a precise definition has

Yes, there is always a benefit.  But, there is also always a cost.

"software slithiness" is a  great phrase, but what are the chances of it being
widely adopted?
Will another might prefer "software frumiousity", and we end up with multiple
terms for roughly the same idea?

What does the evidence say?
Does anybody know of any studies of terminology adoption, or not, by
industry?
How long does it take for new terms to become widely used?

> been shown over and over again in many different disciplines to rarely
> work.  Most people when they see a word whose definition they believe they
> know will use the definition they know no matter how clearly or how often a
> different definition is given*. This phenomenon seems to be more often true
> than not even among people who are aware of it, and try really hard to
> overcome it.
> 
> In my opinion, "software reliability" is the second worst (that is,
> misleading, unhelpful, confusing) phrase ever coined within the software
> community**.  To far too many people (myself included) "reliability"
> necessarily includes notions of either randomness (for example, given an
> identical environment, history, design, and manufacturer, component A fails
> but B does not) or degradation over time.  Because neither notion applies
> to conventional software, the phrase "software reliability" is (and always
> will be) to me at best meaningless and at worst misleading.
> 
> With that said, I nevertheless have quite a bit of respect for some of the
> folks who conduct research in the area, particularly those in the UK who
> have been doing it a long time.  I just wish they'd get rid of the name.
> Even a nonsense word, perhaps selected out of Jabberwocky, would be a vast
> improvement.  How about "software slithiness"?
> 
> * Without naming names, I'd suggest that the once frequent intense
> disagreements between two long-time members of the list can be explained to
> a large measure by this phenomenon.
> 
> ** DO-178's "derived requirements" is the worst.  What's a "derived
> requirement"?  A requirement that is *not* derived. I'll resist the
> temptation to broaden "software community" to encompass safety/assurance
> case work, where really bad terminology abounds.
> 
> 
> *--cMh*
> 
> I used to think I was really good at imaging worst case scenarios.
> 
> 
> On Tue, Sep 15, 2020 at 8:57 AM Derek M Jones <derek at knosof.co.uk> wrote:
> 
>> Nick,
>>
>>> As I recall, I have said before on this list, software has no wear out
>>> mechanism so software reliability is somewhat meaningless.  I was widely
>>
>> There is no physical wear out mechanism, but the environment in which
>> software
>> is run can change (the same is true for hardware, but people don't tend to
>> talk
>> about this).
>>
>>> abused (some even said bullied) for suggesting that software reliability
>>> was not the right way of thinking about software assurance.  It is
>>> therefore with some trepidation that I dive into this thread.
>>
>> Some term has to be used.  An existing word comes with lots of baggage, but
>> I don't see any overall benefit in trying to define a new word.
>>
>> The ANSI definition is encompasses what needs to be said:
>> "Software Reliability is defined as: the probability of failure-free
>> software operation for a specified period of time
>> in a specified environment."
>>
>> The "specified environment" is what tends to get ignored in most analysis.
>> The Ariane A501 was a different environment and outside the bounds of
>> prior analysis.
>>
>> It is very difficult to obtain data on the environment in which software
>> runs.
>>
>> For instance, the number of reported faults would be expected to increase
>> with number
>> of users.  Where is the data?  I have managed to find a few very noisy
>> datasets, and yes
>> reported faults increases with users.
>>
>> The bi-exponential function keeps cropping up in fuzzing data.  It's very
>> suggestive:
>>
>> https://shape-of-code.coding-guidelines.com/2017/12/12/the-shadow-of-the-input-distribution/
>>
>>>
>>> Nick Tudor
>>> Tudor Associates Ltd
>>> Mobile: +44(0)7412 074654
>>> www.tudorassoc.com
>>>
>>> *77 Barnards Green Road*
>>> *Malvern*
>>> *Worcestershire*
>>> *WR14 3LR*
>>> *Company No. 07642673*
>>> *VAT No:116495996*
>>>
>>> *www.aeronautique-associates.com <http://www.aeronautique-associates.com
>>> *
>>>
>>>
>>> On Tue, 15 Sep 2020 at 09:46, Peter Bishop <pgb at adelard.com> wrote:
>>>
>>>> On 14/09/2020 15:04, Martyn Thomas wrote:
>>>>
>>>> Why are you completely dismissing software reliablity?
>>>>
>>>> Is it not the case that if you can tolerate a failure rate of once in
>> 1000
>>>> hours, 99% confidence through testing would take about 200 days to
>>>> demonstrate (so long as the test environment is "sufficiently" like the
>>>> future operating environment and you are able to detaect every failure
>>>> correctly)?
>>>>
>>>> And statistical testing is used in the UK nuclear industry fore safety
>>>> critical systems, so it is not just abstract theory,
>>>>
>>>> Re your characterisation of confidence based statistical testing on P153
>>>> (with no reference), I do not think it is fair to dismiss this because
>> "p
>>>> can vary by orders of magnitude". Testing presumes a fixed operational
>>>> profile and a constant probability of failure.
>>>>
>>>> There has also been some work on the impact of profile change on the
>> bound
>>>> that can be claimed.
>>>>
>>>>
>>>>
>> https://www.researchgate.net/publication/307555914_Deriving_a_frequentist_conservative_confidence_bound_for_probability_of_failure_per_demand_for_systems_with_different_operational_and_test_profiles
>>>>
>>>> BTW, re, your summary of my paper on the same page, I think you missed
>> the
>>>> main point. This is a* predictive* theory to derive a worst case bound
>>>> for some time in the future, i.e.
>>>>
>>>> Given N faults what is the worst possible reliability  at some future
>> time
>>>> T?
>>>> - it assumes fault fixing  will occur during that time.
>>>>
>>>> You also only presented the theory of N=1, and you seem to assume the T
>>>> has already happened with zero failures (not a requirement for this
>> model)
>>>>
>>>> Might have been better to reference the original worst case bound
>> version
>>>> (which makes it clear that it is a long term forward prediction)
>>>>
>>>>
>>>>
>> https://www.researchgate.net/publication/3152200_A_conservative_theory_for_long-term_reliability-growth_prediction
>>>>
>>>> Of course, the testing would have to be repeated following a change to
>> the
>>>> software, unless you have enough formality to show that the change
>> cannot
>>>> affect reliability.
>>>>
>>>> In specific circumstances, you can do better than this. Bev Littlewood's
>>>> published papers provide strong evidence and a rich bibliography. Bev's
>>>> paper on "How reliable is a program that has never failed?" offers a
>> useful
>>>> rule-of-thumb: that aften n hours of fault free operation, there is
>> about
>>>> 50% chance of a failure in the following n hours (subject to some
>> obvious
>>>> constraints).
>>>>
>>>> The difficulties rapidly escalate when you need 10^-4 or better at >90%
>>>> confidence.
>>>>
>>>> Martyn
>>>> On 14/09/2020 14:14, SPRIGGS, John J wrote:
>>>>
>>>> In my experience, if Software Reliability is mentioned at a conference,
>> at
>>>> least one member of the audience will laugh, and if it is mentioned in a
>>>> work discussion, at least one member of the group will get angry.
>>>>
>>>> Interestingly, some of the same people who say it is impossible to
>>>> quantify software failure rates will set numerical requirements for
>>>> Software Availability – if you get one of those, ask the Customer how
>> (s)he
>>>> wants you to demonstrate satisfaction of the requirement.
>>>>
>>>>
>>>>
>>>> John
>>>>
>>>> *From:* systemsafety <
>> systemsafety-bounces at lists.techfak.uni-bielefeld.de>
>>>> <systemsafety-bounces at lists.techfak.uni-bielefeld.de> *On Behalf Of
>> *Derek
>>>> M Jones
>>>> *Sent:* 14 September 2020 12:54
>>>> *To:* systemsafety at lists.techfak.uni-bielefeld.de
>>>> *Subject:* [SystemSafety] What do we know about software reliability?
>>>>
>>>>
>>>>
>>>> All,
>>>>
>>>> What do we know about software reliability?
>>>>
>>>> The answer appears to be, not a lot:
>>>>
>>>>
>> http://shape-of-code.coding-guidelines.com/2020/09/13/learning-useful-stuff-from-the-reliability-chapter-of-my-book/
>>>>
>>>> --
>>>> Derek M. Jones Evidence-based software engineering
>>>> tel: +44 (0)1252 520667 blog:shape-of-code.coding-guidelines.com
>>>> _______________________________________________
>>>> The System Safety Mailing List
>>>> systemsafety at TechFak.Uni-Bielefeld.DE
>>>> Manage your subscription:
>>>> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
>>>>
>>>>
>>>> ------------------------------
>>>> If you are not the intended recipient, please notify our Help Desk at
>>>> Email Information.Solutions at nats.co.uk immediately. You should not copy
>>>> or use this email or attachment(s) for any purpose nor disclose their
>>>> contents to any other person.
>>>>
>>>> NATS computer systems may be monitored and communications carried on
>> them
>>>> recorded, to secure the effective operation of the system.
>>>>
>>>> Please note that neither NATS nor the sender accepts any responsibility
>>>> for viruses or any losses caused as a result of viruses and it is your
>>>> responsibility to scan or otherwise check this email and any
>> attachments.
>>>>
>>>> NATS means NATS (En Route) plc (company number: 4129273), NATS
>> (Services)
>>>> Ltd (company number 4129270), NATSNAV Ltd (company number: 4164590) or
>> NATS
>>>> Ltd (company number 3155567) or NATS Holdings Ltd (company number
>> 4138218).
>>>> All companies are registered in England and their registered office is
>> at
>>>> 4000 Parkway, Whiteley, Fareham, Hampshire, PO15 7FL.
>>>> ------------------------------
>>>>
>>>> _______________________________________________
>>>> The System Safety Mailing Listsystemsafety at TechFak.Uni-Bielefeld.DE
>>>> Manage your subscription:
>> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
>>>>
>>>>
>>>> _______________________________________________
>>>> The System Safety Mailing Listsystemsafety at TechFak.Uni-Bielefeld.DE
>>>> Manage your subscription:
>> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
>>>>
>>>> --
>>>>
>>>> Peter Bishop
>>>> Chief Scientist
>>>> Adelard LLP
>>>> 24 Waterside, 44-48 Wharf Road, London N1 7UX
>>>>
>>>> Email: pgb at adelard.com
>>>> Tel:  +44-(0)20-7832 5850
>>>>
>>>> Registered office: 5th Floor, Ashford Commercial Quarter, 1 Dover
>> Place, Ashford, Kent TN23 1FB
>>>> Registered in England & Wales no. OC 304551. VAT no. 454 489808
>>>>
>>>> This e-mail, and any attachments, is confidential and for the use of
>>>> the addressee only. If you are not the intended recipient, please
>>>> telephone 020 7832 5850. We do not accept legal responsibility for
>>>> this e-mail or any viruses.
>>>>
>>>> _______________________________________________
>>>> The System Safety Mailing List
>>>> systemsafety at TechFak.Uni-Bielefeld.DE
>>>> Manage your subscription:
>>>> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
>>>
>>>
>>> _______________________________________________
>>> The System Safety Mailing List
>>> systemsafety at TechFak.Uni-Bielefeld.DE
>>> Manage your subscription:
>> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
>>>
>>
>> --
>> Derek M. Jones           Evidence-based software engineering
>> tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com
>> _______________________________________________
>> The System Safety Mailing List
>> systemsafety at TechFak.Uni-Bielefeld.DE
>> Manage your subscription:
>> https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
> 
> 
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE
> Manage your subscription: https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety
> 

-- 
Derek M. Jones           Evidence-based software engineering
tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com