<div dir="ltr"><div>While Single Event Upsets (SEUs) originate in hardware, they can be mitigated through hardware design, software design and/or system design. For example, when I worked on the Airbus A380 Landing Gear Extension and Retraction System (LGERS), the main mitigation against SEUs was the fact that LGERS is a multiple redundant system, meaning that an SEU in one channel will not affect the other channels. However, we were also required to design the software so that we kept three copies of critical data (one of the copies was the one's complement of the other two). This meant that we were able to detect an SEU corrupting one of the copies and restore the correct value from one of the other two copies. We recorded the number of data corruptions in NVRAM. It would be interesting to know how often SEUs have occurred in practice - I expect that SEUs occur quite often, especially at altitude.</div><div><br></div><div>Both the FAA and EASA have required avionic systems to mitigate against SEUs for a long time. EASA Cert Memo CM-AS-004 defines the certification considerations concerning Single Event Effects (SEE). RTCA DO-248C/EUROCAE ED-94C DP #21 provides clarification on SEU as it relates to software. DP #21 suggests protection mechanisms such as parity, cyclic redundancy codes, Hamming codes and storing triple versions of critical data.</div><div><br></div><div>My understanding is that the vulnerability that resulted in the in-flight upset was introduced in version L104 and that it can be avoided by reverting to version L103+. I'm guessing that the software developer introduced some new functionality in L104 but failed to protect critical data required to implement the new functionality. The fact that L104 resulted in an-flight upset soon after it was introduced suggests that SEUs are relatively common and are usually mitigated by hardware, software and/or system design. I expect that the supplier is developing a new software version that will re-introduce the new functionality but protect critical data against SEUs.</div><div><br></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div style="color:rgb(34,34,34)"><a name="SignatureSanitizer_m_-5798674576462993830_SignatureSanitizer_SafeHtmlFilter_UNIQUE_ID_SafeHtmlFilter__MailAutoSig"><span style="font-size:10pt;font-family:Arial,sans-serif">Yours,</span></a><br></div><div style="color:rgb(34,34,34)"><div dir="ltr"><div dir="ltr"><p><span style="font-family:Arial,sans-serif;font-size:10pt">Dewi Daniels | Director | Software Safety Limited</span><br></p><p><span lang="FR" style="font-size:10pt;font-family:Arial,sans-serif">Telephone +44 7968 837742 | Email <a href="mailto:dewi.daniels@software-safety.com" target="_blank">dewi.daniels@software-safety.com</a></span></p><p><font face="Arial, sans-serif">Software Safety Limited is a company registered in England and Wales. Company number: </font><font face="Arial, sans-serif">9390590</font><font face="Arial, sans-serif">. Registered office: Fairfield, 30F Bratton Road, West Ashton, Trowbridge</font><span style="font-family:Arial,sans-serif">, United Kingdom </span><span style="font-family:Arial,sans-serif">BA14 6AZ</span></p></div></div></div></div></div></div><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Mon, 1 Dec 2025 at 08:47, Prof. Dr. Peter Bernard Ladkin <<a href="mailto:ladkin@techfak.de">ladkin@techfak.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Les,<br>
<br>
not being a computer scientist you may not be aware that fault tolerance has for many decades been a <br>
major theme in computer science. The IEEE International Symposium on Fault-Tolerant Computing (FTCS) <br>
started in 1971, 54 years ago. There is IFIP Working Group 10.4 on Dependability and Fault Tolerance <br>
which has been running, as far as I know, for about as long. Jean-Calude Laprie was Chair for many <br>
years. Brian Randell and colleagues at Newcastle University established the computer science <br>
department there, I believe the first in the UK, as a major centre for research into fault tolerance <br>
(you may have heard of "recovery blocks"?). Brian and Tom Anderson were members of IFIP WG 10.4 for <br>
many, many years (maybe still are?). IEEE has a Technical Committee on Depandability and Fault <br>
Tolerance, but its "flagship" conference is now DSN rather than FTCS.<br>
<br>
IFIP WG 10.4 was, I believe, the first organisation to understand that dependability of digital <br>
systems meant rather more than just reliability. Their first terminology was published in 1992 in <br>
five languages by Springer Verlag. Safety and (what was then called) security (which I now prefer to <br>
call cybersecurity) were considered by them to be dependability attributes, for very good reason.<br>
<br>
The IEC, by contrast, consides neither safety nor cybersecurity part of dependability. TC 56 is <br>
Dependability. Digital-system safety in the IEC resides with SC 65A, the "Safety Aspects" <br>
subcommittee (used to be the "System Aspects" SC) of TC 65, Industrial-process control, measurement <br>
and automation. Industrial-process cybersecurity resides in TC 65, although there is a movement to <br>
make their cybersecurity standards more widely applicable (called a "horizontal" function), as SC <br>
65A's safety standard IEC 61508 is (many of us are sceptical about this move).<br>
<br>
So there is a fair amount of silo-ing in the international organisations trying to <br>
define/capture/explicate the state of the art in digital systems dependability, even just in the <br>
computer-science area.<br>
<br>
There are a couple of sources of faults/failures that "come out of nowhere", which weren't paid so <br>
much attention by FT types 30 years ago, but have in the succeeding period increased substantially <br>
in importance. SEEs are one. People dealing with spacecraft routinely protect against SEEs that may <br>
be caused by alpha particles. Protecting against alpha particles is relatively easy compared with <br>
protecting against the derivates which occur when these alpha particles interact with the earth's <br>
atmosphere (called cosmic rays). Then there are Byzantine faults and failures. Algorithmically <br>
resolving Byzantine failures deterministically is known (from Lamport's first paper on the subject) <br>
to be computationally expensive, but there are some network architectures that mitigate their <br>
occurrence (Kevin Driscoll, who is on this list, is the foremost expert on occurrences "in the wild"),<br>
<br>
On 2025-11-30 22:54 , Les Chambers wrote:<br>
> ... I'm surprised<br>
> that this could happen in aviation, which is typically the gold standard in<br>
> Safety-Critical systems design.<br>
<br>
And fault-tolerant digital design. The circumstance that is flabbergasting everyone is, I think, <br>
that they got it right, developed the system further, and got it wrong (whoever "they" is). That is <br>
usually not the way industrial progress works. (Thales, the manufacturer of the ELAC, apparently <br>
told Reuters that "the functionality in question is supported by software that is not under Thales' <br>
responsibility". <br>
<a href="https://www.reuters.com/business/aerospace-defense/airbus-a320-repairs-must-be-before-next-flight-bulletin-shows-2025-11-28/" rel="noreferrer" target="_blank">https://www.reuters.com/business/aerospace-defense/airbus-a320-repairs-must-be-before-next-flight-bulletin-shows-2025-11-28/</a> <br>
)<br>
<br>
A few more details on the incident: "JetBlue Flight 1230, operating from Cancún International <br>
Airport (CUN) to Newark Liberty International Airport (EWR), experienced an uncommanded drop in <br>
altitude approximately one hour after departure. The aircraft, registered N605JB , rapidly lost <br>
about 14,500 feet in five minutes, followed by another 12,200 feet in the next five minutes. The <br>
crew diverted to Tampa International Airport (TPA) and landed at approximately 1420 local time." <br>
from <a href="https://avgeekery.com/airbus-a320-emergency-airworthiness-directive/" rel="noreferrer" target="_blank">https://avgeekery.com/airbus-a320-emergency-airworthiness-directive/</a> I have no experience with <br>
this site and thus don't know how reliable this account can be presumed to be. But that must have <br>
been pretty harrowing for CRW -- the incident played out over ten minutes and they apparently <br>
weren't able to counter.<br>
<br>
PPRuNe probably has a lot more, but this weekend (and into today) I just couldn't face the high <br>
noise-to-signal ratio.<br>
<br>
PBL<br>
<br>
Prof. i.R. Dr. Peter Bernard Ladkin, Bielefeld, Germany<br>
<a href="http://www.rvs-bi.de" rel="noreferrer" target="_blank">www.rvs-bi.de</a><br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
The System Safety Mailing List<br>
<a href="mailto:systemsafety@TechFak.Uni-Bielefeld.DE" target="_blank">systemsafety@TechFak.Uni-Bielefeld.DE</a><br>
Manage your subscription: <a href="https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety" rel="noreferrer" target="_blank">https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety</a></blockquote></div>