[SystemSafety] Safety Culture redux (David Green)

David Crocker dcrocker at eschertech.com
Thu Feb 22 10:14:23 CET 2018


On a related subject, does anyone have a good definition of what
constitutes a software error/fault/bug etc. that is widely applicable,
not just to critical systems? The definition I use is "failure to meet
the reasonable expectations of the user", where what is reasonable is
influenced by the documentation (including requirements specification if
available, user manual etc.) or lack of it. But perhaps one of you has a
better definition.

David Crocker, Escher Technologies Ltd.
http://www.eschertech.com
Tel. +44 (0)20 8144 3265 or +44 (0)7977 211486

On 22/02/2018 08:50, Chris Hills wrote:
> Hi,
>
> I don't mind what words we eventually use  error, failure, fault, defect as long as everyone gets on board with "errors not bugs" and writes about it during 2018.  Get the discussion going and a move away from "bug" and the cosy expectancy of them. 
>
> Once we use error/defect  etc it shifts the emphasis from the expected to something that needs sorting. It also  puts the ownership back with the programmers.  Once they  have to fix "errors" and are therefore more careful it will work back up the process. 
>
> As it is now programmers are generally happy to work with incomplete and ambiguous requirements and designs. They fill in the blanks.   If we get a sea change in the terminology  from bugs to error/defect etc  they hopefully they will want to stop being associated with "error" and will  where ever possible start demanding incomplete designs or requirements are fixed.  They won't want to carry the can for someone else's errors. 
>
> It's not going to happen overnight but let's get the discussion going in 2018 and start the change.  Start writing articles blogs papers etc "errors not bugs"  and tae it from there. Even if you end up with defect or failure start the conversation going and stamp out bugs.  If more of you do it along with those of us who have started the ball rolling it might actually work. 
>
> If not now when? After you or your family age killed by a friendly software "bug".    
> Do it for the children. :-) 
>
>
> -----Original Message-----
> From: systemsafety [mailto:systemsafety-bounces at lists.techfak.uni-bielefeld.de] On Behalf Of Peter Bernard Ladkin
> Sent: Thursday, February 22, 2018 6:52 AM
> To: systemsafety at lists.techfak.uni-bielefeld.de
> Subject: Re: [SystemSafety] Safety Culture redux (David Green)
>
>
>
> On 2018-02-22 00:54 , Steve Tockey wrote:
>> IEEE already has a recommended vocabulary:
>>
>> Incident = any difference between the observed result and the expected 
>> result
>>
>> Failure = it has been determined that the observed result is incorrect
>>
>> Fault or Defect = the aspect of the code caused the incorrect result
>>
>>
>> If adequate vocabulary already exists, why try to invent new terms?
> Because there are things wrong with this series of definitions.
>
> First, an incident in most people's usage is an event. With nothing counterfactual about it. It just is (or was). A "difference" is not an event, but a contrastive feature of two things, one of which is counterfactual. So the definition of "incident" here confuses an event (what did happen) with its features (that one of the aspects contrasts with what was expected).
>
> Contrastive description is common and useful, but it is better not to conflate an event with its description, for the following reasons amongst others. Obviously, in order to individuate an event you do so with a description, because that is in part how language works. A description (if it fits) picks out an aspect of an event. But that aspect may be superficial, and not key. If someone proffers a superficial description, you want a second person to be able to say "that is not all of what went wrong here, that is just a part of it". Whereas, with this definition, the second person is not refining what the first said by identifying a more significant aspect, they are literally describing a different incident. You have as many different incidents as you do aspects, and the set of aspects is not usually very well bounded. William of Ockham had something to say about that.
>
> It is usual to designate a complex-system incident or accident that as an event, one event. But according to the IEEE definition, it becomes a plethora of difference specifications.
>
> Second, the definition of "failure" requires a "determination", which is a human act. If the system is not sociotechnical, then failure is an objective matter without a social component. Further, I think we can bet that the IEEE does not say what a "determination" consists in. Continuing, the definition makes essential use of the notion of "correct". Is that defined somewhere? "Correct" and "incorrect" are both notions which involve a comparison between a result and a norm. What norm would that be? "What the system should have done"? The problem there is the word "should", which has a moral connotation. "What person X thinks would have been a more appropriate outcome"? How do you pick person X? "What most people dealing with the system agree would have been a more appropriate outcome"? How do you select that crowd? We might like to say "What the system specification says happens in that case". But that supposes the system has a specification, and that specification is adequate to det
>  ermine how the system behaves in this case. One suspects the definition was formulated to finesse that need.
>
> Third, the common idea of "fault" is "<certain system aspects> which caused the failure" (with "<certain system aspects>" to be determined. It was likely causally contributory to the failure that the system received certain inputs - the definition of "fault" here entails that the presence of those inputs are part of the fault. Should that be so? Intuitively, we would say no: (a) if the inputs were inappropriate, they should have been filtered and the lack of filtering was part of the fault, not the inputs themselves; (b) if the inputs were appropriate, then it is the way the system processed them that is usually taken to be the fault, not the inputs themselves.
>
> Can we fix these issues easily? Sure. I recommend, as usual, the definitions in https://causalis.com/90-publications/99-downloads/DefinitionsForSafetyEngineering.pdf
>
> PBL
>
> Prof. Peter Bernard Ladkin, Bielefeld, Germany MoreInCommon Je suis Charlie
> Tel+msg +49 (0)521 880 7319  www.rvs-bi.de
>
>
>
>
>
>
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE




More information about the systemsafety mailing list