[SystemSafety] Stupid Software Errors [was: Overflow......]

Matthew Squair mattsquair at gmail.com
Mon May 4 14:31:37 CEST 2015


Maybe the conclusion is just that 'people are just no damn good'? To quote
that great Australian philosopher Nick Cave

On the other hand I don't think we should loose sight of the fact that the
Boeing 'bug' was found by running a long duration simulation, not by an
airliner falling out of the sky. So perhaps thanks is due to the Boeing
safety or software engineer(s) who insisted on a long run endurance test
and who might have actually learned something from history?



On Mon, May 4, 2015 at 4:41 PM, Peter Bernard Ladkin <
ladkin at rvs.uni-bielefeld.de> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> I wrote a version of the following a few days ago to a closed list.
>
> AA has EFBs crashing on a number of flights. Apparently two copies of the
> approach chart for
> Reagan Washington National airport were included in after the latest
> update of the EFB, and the
> app wasn't able to handle having two files with almost-identical metadata
> denoted as "favorites".
> A colleague who flies for a major airline (not AA) which uses EFBs spoke
> of some colleagues having
> their EFBs crash early on Jan 1 one year - they fixed it by rolling the
> date back a day.
>
> On the Boeing 787: think of 32-bit Unix clock, and lots of examples.
> There's even a Wikipedia page
> http://en.wikipedia.org/wiki/Time_formatting_and_storage_bugs .
>
> Remember Apple's go-to fail (CVE-2014-1266) from 2014: missing parsing
> checks.
>
> These are simple, known types of error. Forty years ago, it was known how
> to avoid all these kinds
> of problems. Twenty years ago, there were industrial-quality engineering
> tools available (proper
> languages and coding standards checkers) which enabled companies to avoid
> such problems without
> undue development costs.
>
> I don't buy Derek Jones's or Tom Ferrell's versions of the curate's egg. I
> don't see why anyone
> else should, either. Are they still going to be saying "well, it depends,
> it's complicated" in
> another twenty years when stupid coding errors still make it through into
> supposedly-dependable
> software products?
>
> Look at go-to fail. That's critical code! How come critical code such as
> that is not routinely
> subject to static analysis?
>
> Look at the 787 generator code. A systematic loss of all generators is
> surely a hazardous event.
> That should make it 10^(-7). Oh, but I forgot. Even though correct
> operation of SW contributes to
> the 10^(-7), the reliability of the SW itself is not assessed. But surely
> it gets to be at least
> DAL B, since the result is a hazardous event? Oh, but I forgot something
> else. A systematic
> failure like that would be common cause, and the certification
> requirements concern single
> failures, not common cause failures. So that's all right then. Tom's
> suggestion that it might have
> been a design compromise is vitiated by the fact that the phenomenon is
> subject to an
> AIRWORTHINESS Directive by the FAA. (Is that sufficient emphasis?)
>
> If people had told me thirty years ago that we'd still be making the same
> stupid mistakes in the
> same ways, but this time in code more fundamental to the safe or secure
> operation of everyday
> engineered objects, I wouldn't have believed it.
>
> Maybe it's a social thing. Mostly, people actually writing the code and
> inspecting it are in their
> twenties and their bosses maybe at most in their early thirties. The young
> people have never made
> *this* mistake before - the previous lot had of course, but they're all in
> management now. I'm
> reminded of Philip Larkin's ode to rediscovery, Annus Mirabilis:
>
> Sexual intercourse began
> In nineteen sixty-three
> (Which was rather late for me)-
> Between the end of the Chatterley ban
> And the Beatles' first LP.
>
> The Ensuing Discussion.
>
> There was obviously discussion on the list of why we are making the same
> old mistakes forty years
> after it was known how to avoid them. Some discussants suggested it might
> help to professionally
> certify software engineers, a PE. Others referred to the Knight-Leveson
> study a decade ago for the
> ACM, in which inserting SE into the current PE scheme was not seen as
> advantageous. UK discussants
> pointed out that such certification exists in the UK, as a CEng through
> the BCS or IET, and that
> there had been some UK consideration of extra qualification for
> critical-software engineering.
>
> Such qualification for system safety hasn't (yet) generally caught on
> anywhere. SARS offer it in
> the UK for example. It didn't catch on in the US. Over a decade ago, the
> System Safety Society
> introduced an option for system safety engineering into the PE exam. They
> had to pay the NPSE or
> NCEES (I forget which) lots of money per year to maintain the option - and
> two people took it in
> some number of years. So they dropped it. (I was at the board meeting in
> Ottawa in 2004 when this
> was decided.)
>
> The UK qualification regime hasn't stopped IT disasters in government
> procurement. And it hasn't
> stopped the kind of poor engineering which allows bank ATMs which use
> supposedly
> pseudo-one-time-pad nonce generation to be subject to replay attacks (see
> a recent paper reciting
> local experiments performed by Ross Anderson's group). I do note, however,
> that the three examples
> I mentioned above are all US examples. It's not ruled out that having some
> degree of formal
> professional training, as in the UK, encourages software engineers to
> avoid repeating simple
> mistakes whose prophylaxis has been well known for decades.
>
> Time was, when UK and US cars were not known for their reliability. Kind
> of like SW,
> relatively-inexpensive cars used to go wrong a lot. However, some very
> expensive cars such as made
> by Rolls-Royce/Bentley and Wolseley were reliable. So there was proof of
> concept. Japanese
> companies decided it was possible to produce reliable
> relatively-inexpensive cars and make money,
> and did it.
>
> There is proof of concept in SE, too. Unlike Rolls-Royce cars, it is not
> prohibitively expensive.
> Three out of my four examples involve run-time error. It is feasible to
> produce SW
> cost-effectively which is free from run-time error. Just like the Japanese
> approach to cars, you
> just have to decide to do it.
>
> How about the following? We design a document called A Programmer's
> Pledge. It has thirty or so
> numbered clauses:
>
> * I promise never to deliver SW which is subject to a data-range roll-over
> phenomenon (especially
> dates and times)
>
> * I promise never to deliver software which is subject to a numerical
> overflow or underflow exception
>
> * I promise never to deliver software which reads data on which it raises
> an "out of range" exception
>
> * ..... and so on
>
> A professional programmer signs it and files it with hisher professional
> organisation. Quality
> control issues in programs (such as the above phenomena) are routinely
> subject to RCA of sorts.
> When a programmer is responsible for a piece of code with such an error in
> it, the company reports
> it to the professional organisation and the programmer gets "points"
> attached to the corresponding
> clause in hisher Pledge. Like with driving (Germans say "points in
> Flensburg" which is where the
> office is. What is it in the UK? "Points in Cardiff"?). I bet lots of
> organisations, from
> companies hiring programmers to professional-insurance companies will find
> uses for it.
>
> PBL
>
> Prof. Peter Bernard Ladkin, Faculty of Technology, University of
> Bielefeld, 33594 Bielefeld, Germany
> Je suis Charlie
> Tel+msg +49 (0)521 880 7319  www.rvs.uni-bielefeld.de
>
>
>
>
> -----BEGIN PGP SIGNATURE-----
>
> iQEcBAEBCAAGBQJVRxS0AAoJEIZIHiXiz9k+Sv4H/3qSuiODGIZarIb0Rwj4PoOR
> gi6zvdAb1ns2A8w0xXiBz6E8+iwik53ueVxhEDTINA4RXyoLTfFEVl9yunOR0qnU
> 7ht92kguaSjuM3BGUGYzy8MpZMjc0jyNWRmyC3wh0y3X0NnjL+/GMiqYR+3zq5RX
> ZEzJk89SboZiB1kyTqMM+IcKzbABmk1CSaAkQziGvdJFWklNM10prMIk/5MprGwV
> EeePB1rGs13Z1LZi8GIqdz8PDc1FKSz5qRugQ8VZJbbJvgct9JJVfEtQx3uElGkt
> a/E5fQ/+Gw8CARMhpktEr/wLdk7t3akJvNF5iLK5W7Mbb3h0kd7sCNLZ5d9OZyA=
> =i/nm
> -----END PGP SIGNATURE-----
> _______________________________________________
> The System Safety Mailing List
> systemsafety at TechFak.Uni-Bielefeld.DE
>



-- 
*Matthew Squair*
MIEAust CPEng

Mob: +61 488770655
Email: MattSquair at gmail.com
Website: www.criticaluncertainties.com <http://criticaluncertainties.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20150504/636ddf5f/attachment.html>


More information about the systemsafety mailing list