[SystemSafety] Bursting the formal methods bubble

Wed Nov 8 11:38:01 CET 2017

I thought I'd support Steve's position on inspections with this little true
story I wrote some time ago but I think is still relevant.
ref: http://www.systemsengineeringblog.com/extreme-review-a-tale-of-
nakedness-alsatians-and-fagan-inspection/

Cheers
Les

-----Original Message-----
From: systemsafety [mailto:systemsafety-bounces at lists.techfak.uni-bielefeld.
de] On Behalf Of Steve Tockey
Sent: Tuesday, October 31, 2017 3:45 AM
To: Derek M Jones; systemsafety at lists.techfak.uni-bielefeld.de
Subject: Re: [SystemSafety] Bursting the formal methods bubble

Derek,

“You are saying that, in your line of work, formal inspections are a cost
effective way of doing things.”

Yes.

"I am not disputing this, only that in other domains other techniques may
be needed/used.”

I totally agree.

But remember that what triggered this thread was the topic of N-version
development. My argument all along has been that a) I haven’t seen any
data that would suggest that N-version development is any more effective
at delivering low-defect software than use of inspections, and b) use of
inspections is a lot more efficient (i.e., cost effective) at delivering
low-defect software than N-version.

That other techniques than inspection can and/or should be used, both
separately and in conjunction with inspections, in various domains is
without question.

― steve

-----Original Message-----
From: systemsafety <systemsafety-bounces at lists.techfak.uni-bielefeld.de>
on behalf of Derek M Jones <derek at knosof.co.uk>
Organization: Knowledge Software, Ltd
Date: Sunday, October 29, 2017 at 3:31 PM
To: "systemsafety at lists.techfak.uni-bielefeld.de"
<systemsafety at lists.techfak.uni-bielefeld.de>
Subject: Re: [SystemSafety] Bursting the formal methods bubble

Steve,

> You asked: “Is the problem size an issue?  The Brilliant et al study
> required a mean
> effort of 47 hours to complete, the NASA study hired everybody for 10
> weeks.”
> 
> I suspect it is not, although I don’t have any solid data to back it up.
> 

There are all kinds of reasons why one study did not report
specification faults and the other did: perhaps knight+leveson
wrote a more rigorous specification, perhaps their subjects were
not motivated to find such mistakes, perhaps they found them but these
were not reported.

The findings of one study need to be replicated many times before we
can be confident of the circumstances under which they hold.

> You also asked: “Do you have any data on this?” in reference to peer
> reviews.
> 
> Yes, there’s a ton of data, particularly on use of Formal Inspections.
>One
> of the best sources is the following:

These are not data, they are reports of findings that include some
numbers.

> Other good data sources include:
> 
> Capers Jones, “Software Quality in 1999: What Works and What Doesn’t.”,
> Software Quality Research Inc., 1999

I find it too hard to tell the difference between Capers Jones'
made-up example numbers, his actual data numbers and combinations
thereof.
http://shape-of-code.coding-guidelines.com/2016/05/11/applied-software-meas
urement-shame-about-the-made-up-numbers/

> *) Formal inspections are between 60% and 90% effective at finding
>defects
> (my experience on projects was as high as 95%)

In some domains these numbers are much too far away from 100%.
Other kinds of formal methods (of the more expensive kind) claim
to get a lot closer (in fact some claim to hit 100%).

> The level of investment in Formal Inspections varies from project to
> project, but in my personal experience it tends to hover at around 10% of
> total technical labor (I mean, total effort spend on requirements,
>design,
> construction, developer testing, and inspections combined).

You are saying that, in your line of work, formal inspections are a cost
effective way of doing things.

I am not disputing this, only that in other domains other techniques may
be needed/used.

> 
> 
> So I would summarize my earlier argument this way:
> 
> 1) Assume a typical project using mainstream development practices
> (specifically, not using Formal Inspections) would cost C$. To develop
>the
> software more than once in an N-version manner would cost N*C$.
> 
> 2) In typical organizations about 60% of total project effort is spent on
> unplanned rework to fix defects injected earlier (e.g., in requirements,
> in design). Using Formal Inspections can reduce that unplanned rework
> effort to around 10%. The total cost of the project when using Formal
> Inspections drops from C$ to 0.5*C$.
> 
> 3) Per the quote from the NASA paper above, “It is not possible to
> quantify from this task the potential benefits of N-version programming
>in
> specification debugging” so we can’t reliably compare defect finding
> effectiveness in N-version development with that of Formal Inspections. I
> think it’s safe to say that N-version development is unlikely to be
>better
> than Formal Inspections. For sake of argument, let’s assume it’s the
same
> even though IMHO I think it’s likely to be even nearly as good.
> 
> 4) Given than in N-version development N > 1, it is clear that N*C$ >>
> 0.5C$ for the same (or, likely, better) quality out of the project that
> used Formal Inspections
> 
> 
> 
> Cheers,
> 
> ― steve
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Derek M Jones <derek at knosof.co.uk>
> Organization: Knowledge Software, Ltd
> Date: Thursday, October 26, 2017 at 3:02 PM
> To: Steve Tockey <Steve.Tockey at construx.com>,
> "systemsafety at lists.techfak.uni-bielefeld.de"
> <systemsafety at lists.techfak.uni-bielefeld.de>
> Subject: Re: [SystemSafety] Bursting the formal methods bubble
> 
> Steve,
> 
>> “Using different sets of people increases the probability that problems
>> will be found in the specification/design, through people having
>> different points if view and asking different questions (I cannot find a
>> DOD N-version study that documented this effect).”
> 
> It was a NASA study, see table 4-1:
> https://ntrs.nasa.gov/search.jsp?R=19870020663
> 
>> 1) According to the Brilliant, Knight, Leveson paper there won’t be very
>> much additional defect identification. If I recall correctly, the
>> subjects
>> in their study were all working to the same requirements specification
>> and
>> around 60% of the defects they injected were in common to most
>> developers.
> 
> Is the problem size an issue?  The Brilliant et al study required a mean
> effort of 47 hours to complete, the NASA study hired everybody for 10
> weeks.
> 
>> 2) The cost of building non-trivial software even only once can be
>>pretty
>> high, building it more than once simply multiplies the cost by the
>>number
>> of versions built. It can be really, really expensive to build parallel
>> versions.
> 
> Yes, this is one reason why formal methods (which essentially creates
> a whole new implementation) is only used for critical cases.
> 
>> 3) The same (or higher) defect FINDING effectiveness could easily be
>> achieved by using an appropriate form of independent peer review, and
>>the
>> cost of running independent peer reviews on one version being built will
>> be significantly less than the cost of building N versions.
> 
> Do you have any data on this?
> 
>> So I am agreeing that multiple people lead to better work, however all
>>of
>> the data I have point to the most effective use of most of those people
>> will be in a peer review capacity not in a build-it-more-than-once
>> capacity.
> 
> This is one argument.  The whole issue is being discussed in a ultra-low
> data environment (there are small specific bits and pieces floating
> around).
> 
>>
>>
>> ― steve
>>
>>
>>
>> -----Original Message-----
>> From: systemsafety <systemsafety-bounces at lists.techfak.uni-bielefeld.de>
>> on behalf of Derek M Jones <derek at knosof.co.uk>
>> Organization: Knowledge Software, Ltd
>> Date: Wednesday, October 25, 2017 at 10:57 AM
>> To: "systemsafety at lists.techfak.uni-bielefeld.de"
>> <systemsafety at lists.techfak.uni-bielefeld.de>
>> Subject: Re: [SystemSafety] Bursting the formal methods bubble
>>
>> Steve,
>>
>>> ³If you do multiple implementations, e.g., Ada+formal or
>>> Ada+Visual basic, you will detect more problems and the result will
>>> be more reliable software.²
>>>
>>> Do you mean ³N-Version development²? If so, this would seem to
>>> contradict
>>> results published by Susan Brilliant, John Knight, and Nancy Leveson in
>> ...
>>> The point I got out of that paper was that developers don¹t make
>>> randomly
>>> different mistakes, they usually make the same systemic mistakes
>>> (notably
>>
>> People do make the same mistakes, during the specification, design,
>> requirements and testing phases.
>>
>> Using different sets of people increases the probability that problems
>> will be found in the specification/design, through people having
>> different points if view and asking different questions (I cannot find a
>> DOD N-version study that documented this effect).
>>
>> Having multiple implementations provides a means for comparing output,
>> looking for differences.
>>
>> There has been some work looking at how using different languages and
>> computing platforms leads to more 'difference' faults being found.
>>
>> Would more faults be uncovered by reimplementing an Ada program in
>> Visual basic or formal methods?  They are both very different.
>>
>>> -----Original Message-----
>>> From: systemsafety
>>><systemsafety-bounces at lists.techfak.uni-bielefeld.de>
>>> on behalf of Derek M Jones <derek at knosof.co.uk>
>>> Organization: Knowledge Software, Ltd
>>> Date: Wednesday, October 25, 2017 at 6:36 AM
>>> To: jean-louis Boulanger <jean.louis.boulanger at gmail.com>
>>> Cc: "systemsafety at lists.techfak.uni-bielefeld.de"
>>> <systemsafety at lists.techfak.uni-bielefeld.de>
>>> Subject: Re: [SystemSafety] Bursting the formal methods bubble
>>>
>>> jean-louis,
>>>
>>>> I thinks that we have more example of mistake in classical development
>>>> that
>>>> in formal development
>>>
>>> This is because there is so little formal development, we
>>> don't have enough data to reliably say one way or the other.
>>>
>>>> yes all method are applied by human and error appear but with formal
>>>> methods you have a tool to detect it.
>>>
>>> There are tools to detect mistakes in code (not that these are widely
>>> used :-(
>>>
>>>> with ADA, C .... C++ or JAVA and C# its more difficult to develop a
>>>> software with a very low level of bug ...
>>>
>>> If you do multiple implementations, e.g., Ada+formal or
>>> Ada+Visual basic, you will detect more problems and the result will
>>> be more reliable software.
>>>
>>> The important question is bugs (or lack of) per buck.
>>> Which technique is the most cost effective?
>>>
>>
> 

-- 
Derek M. Jones           Software analysis
tel: +44 (0)1252 520667  blog:shape-of-code.coding-guidelines.com
_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE

_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE