[SystemSafety] Correctness by Construction
Peter Bernard Ladkin
ladkin at causalis.com
Fri Jul 10 14:44:13 CEST 2020
On 2020-07-10 13:28 , Dewi Daniels wrote:
> I agree with Tom.
We are really all agreeing with Michael.
> I don't think you'd find the problem by doing more software testing.
Most obviously not. That was the point of Michael's question, I take it.
> 1. They assessed MCAS as DAL C because it was intended to have limited authority (initially 0.6
> degrees, later increased to 2.5 degrees).
It is a little more subtle than that. Boeing knew in 2016 that there was a hazardous event
associated with MCAS activation. But they used "statistical credit" to classify it as major and
thereby DAL C. Here is a citation from the US DoT Inspector General's report
[begin quote]
March 16, 2016
....... [Revision D released]
In its MCAS Revision D, Boeing also included an assessment of functional hazards related to the
software, describing hazard descriptions, failure conditions, and
associated effects. One of the noted hazards was an uncommanded or automatic MCAS activation that
continued until the pilot took action. When developing this risk assessment, Boeing tested
unintended MCAS activation in the simulator and assumed that commercial pilots would recognize the
effect as a runaway stabilizer — a scenario which is covered in basic commercial pilot training —
and react accordingly. Boeing assumed the average pilot reaction time in this scenario to be 4
seconds, which Boeing classified as a hazardous event 38. However, if a pilot’s reaction time was
greater than 10 seconds, the event would be classified as catastrophic due to the pilot’s inability
to regain control of the aircraft. Despite these significant revisions, Boeing did not provide
internal coordination documents for Revision D, noting the increased MCAS range, to FAA
certification engineers. Because these revision documents were not required certification
deliverables, the company did not submit them to FAA for review or acceptance.
Footnote 38: Boeing added a statistical credit in its evaluation of this scenario that reduced the
effect from Hazardous to Major, based on the assumption that it was unlikely that a typical flight
would be operating outside of normal aircraft parameters.
[end quote]
There is something more pernicious. Some of us were told in March 2019 on unimpeachable authority
that there were known to be hazardous events associated with MCAS activation. Clive Leyman pointed
out that regulations thereby preclude single points of failure of probability less than 10^(-7) per
op hour. Relying on one AoA sensor is manifestly a single point of failure, and they are known to
bug out much more frequently than that (Peter Lemme wrote that on the Boeing 747 they have an
average life of about 90,000 ophours). So Clive and I were left wondering what was going on here. We
posited a thoroughly bungled safety analysis, which turned out to be right.
Clive pointed out that you wouldn't allow a single point (channel) of failure in any case in which
there is a hazardous failure event even if you thought it would fail sufficiently rarely - there
would be a back-up (at least one). We were suspecting that might not have been quite true at Boeing,
because of Sidney Dekker's observations on the Turkish Amsterdam accident.
So Boeing had significant reason to take "statistical credit" - if they hadn't done so, then they
would have had to have redesigned the system to back up an AoA sensor failure. They turned out to be
absolutely right on how much work that would be - it has taken the company 14 months -- the revised
MAX has been released for flight test in the last couple of weeks. (Of course, they well know, now,
that they should have done it anyway.)
> 2. My understanding is that the only validation carried out of the requirement was that they applied
> 2.5 degrees of nose down trim in the simulator and confirmed the pilot was able to counteract the
> nose down trim by using the yoke and electric trim. It doesn't appear they simulated an AoA sensor
> failure either during simulation or during flight test. Had they done so, they would have realised
> that if the AoA sensor failed hard-over, applying nose down trim would not reduce the reported angle
> of attack, so MCAS would apply nose down trim repeatedly until full nose down trim was applied.
But they knew that an erroneous MCAS activation could have catastrophic consequences. Not only did
the DoT IG say so (above), but here is a sentence from the preliminary version of the Congressional
report: "Boeing also withheld knowledge that a pilot would need to diagnose and respond to a
“stabilizer runaway” condition caused by an erroneous MCAS activation in 10 seconds or less, or risk
catastrophic consequences." p3.
I suspect it is a greyer area regulation-wise than it seems. Yes, it's DAL C, you thought. Then you
do a run and find out that the human recovery action which enabled you to consider the system DAL C
is rather a fragile operation.
As far as I know, there is no *required* feedback from any human factors analysis of a recovery
operation to the safety analysis and the Design Assurance Level.
Of course, JATR already observed in October 2019 that the regs need fixing.
> I believe that to improve aircraft safety, we need to get
> better at writing requirements and validating requirements.
Yes. But not just aircraft safety. And not just we, referring to software specialists. The MAX 8
accidents, and Lufthansa A320 Warsaw which Dewi mentions, as well as others, are matters of
aerospace engineers getting system requirements wrong and not picking it up in iron bird or flight
testing.
> There have been no hull-loss accidents in passenger service caused by
> the software implementing the requirements incorrectly,
Um, yes, but there could well have been three. Malaysian 777 out of Perth in 2005, Qantas A330
Learmonth 2008, and another Qantas A330 later that year (December, I think). Luck and piloting skill
saved all.
(BTW, in your SCSC Keynote you refer to "Fallacy #1", lack of FAA oversight. Far from considering it
a fallacy, the House Committee made that that first of 6 major failings. p4)
PBL
Prof. Peter Bernard Ladkin, Bielefeld, Germany
Styelfy Bleibgsnd
Tel+msg +49 (0)521 880 7319 www.rvs-bi.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.techfak.uni-bielefeld.de/pipermail/systemsafety/attachments/20200710/6f212423/attachment.sig>
More information about the systemsafety
mailing list