[SystemSafety] State of the art for "safe Linux"

Tue Aug 6 15:27:40 CEST 2024

Paul,

Thank you for taking the time to respond to my comments. Please see my
responses below.

On Mon, 5 Aug 2024 at 18:33, Paul Sherwood <paul.sherwood at codethink.co.uk>
wrote:

> What do you mean by Linux, here? If you are meaning "the people who
> develop Linux", then I think the reason is that they have been able to
> achieve their goals for the software without compliance. In almost all
> cases the developers will not have needed to consider those specific
> standards at any point during their careers.
>

So, the problem is not that Linux cannot meet the relevant standards, but
that there is no motivation for the Linux developers to comply with the
standards? Surely, if Linux is to be used for safety applications, there
needs to be a 'Safe Linux' branch that complies with the relevant
standards.

> Even Level C isn't that hard. Why wouldn't
> > you want to achieve statement coverage?
>
> Actually, I wouldn't, because of my personal bias. I've successfully
> delivered small amounts of critical code (in the few thousands of LoC
> range) without unit tests, for systems which then worked for years
> without problems. Usually I've had more success with system tests than
> unit tests, on many projects.
>

I accept that before I started writing safety-critical software, I didn't
necessarily test every single line of code. Dave Thomas, who I used to work
for, says in his presentation Agile is Dead • Pragmatic Dave Thomas • GOTO
2015 (youtube.com) <https://www.youtube.com/watch?v=a-BOSpxYJ9M&t=457s> at
18:00 that he doesn't test everything. However, there is a big difference
between safety applications and other kinds of applications. Even IEC 61508
SIL 1 or DO-178C Level D is a very strong claim. Since I started writing
safety-critical software, I have tended to test everything just because it
isn't that hard and doesn't take that much time.

DO-178C doesn't require you to achieve 100% structural coverage through
unit tests. It requires that your requirements-based tests (which can be
hardware/software integration tests, software integration tests or
low-level tests) cover all the software requirements and exercise all the
code. If your tests haven't covered all the requirements, there's
functionality you haven't tested. If your tests haven't achieved statement
coverage, then there's code that you've never executed, not even once,
during your testing.

> I remember attending a talk by
> > the CEO of Red Hat Linux UK, who admitted that most of their
> > developers are volunteers and prefer to spend their time coding rather
> > than writing tests.
>
> While it is true that most or Red Hat's engineers contribute voluntarily
> to open source projects in their spare time, it is also true that Red
> Hat pays some thousands of engineers to work full-time on contributing
> to the Linux kernel and a huge range of open source projects.
> Irrespective of personal preference, I would expect that if Red Hat
> chooses to increase test coverage on any project, they are entirely
> capable of doing so.
>

You claim they are capable of doing so, but they choose not to. It comes
back to the point above that they don't see a commercial incentive to
create a DO-178C or IEC 61508 variant of Red Hat Linux.

> The open-source repositories that I've inspected
> > contain alarmingly few tests.
>
> Well, I think there are hundreds of millions of open source repositories
> on GitHub, and I'm sure almost all of them are clearly not suitable for
> use in critical systems. But there are some thousands of open source
> projects that have been developed and maintained to extremely high
> standards.
>

There are open-source repositories that are suitable for use in critical
systems. One example is AdaCore's GNAT Ada compiler. Most open-source
projects (and I suspect most proprietary software projects), even those
sponsored by large companies, do surprisingly little testing. For example,
I reviewed part of the ARM Trusted Computing Platform on behalf of the
CHERI Standards Compliance (STONE) project
CHERI+STONE+-+Final+Report+-+for+publication+v1.0.pdf
(squarespace.com)
<https://static1.squarespace.com/static/5f8ebbc01b92bb238509b354/t/617924ea4bc0ce729ca8591c/1635329260523/CHERI+STONE+-+Final+Report+-+for+publication+v1.0.pdf>.
The System Control Processor (SCP) firmware consisted of 381,512 SLOC, but
there were only 5,291 SLOC of tests. There are very few open-source
projects that do enough testing to achieve even DO-178C Level D or IEC
61508 SIL 1.

> IEC 61508 provides three compliance routes:
>
> Route 1s: compliant development. Why is it so hard for Linux to just
> > comply with IEC 61508? The requirements for SC1 or SC2 are not very
> > onerous.
>
> As previously stated, the people developing the Linux kernel in general
> have little/no experience (or need to comply) with any specific
> standard. Red Hat is actively working on safety for their RHIVOS
> product, focusing on ISO 26262. Perhaps they will achieve compliance,
> but I suspect there may be a lot of 'tailoring'.
>
> As stated, previous initiatives have failed to achieve certification,
> but this does not mean that Linux has not been deployed in critical
> systems.
>

How was it deployed in critical systems if it was not certified?

> Route 2s: proven in use. This is not considered practicable for
> > complex software such as operating systems. This was recognised by
> > SIL2Linux.
>
> At the risk of seeming 'insulting' again, it seems to me that the proven
> in use path is practically impossible for anything involving modern
> electronics, even before we get to the software.
>

Agreed

> Route 3s assessment of non-compliant development. This is a very
> > sensible way of allowing the use of an open-source operating system
> > such as Linux (more so than the COTS guidance in DO-278A). Route 3s
> > can be summarised as a. what do we know about the software? b. what
> > evidence do we have that it works? and c. what happens if it goes
> > wrong? This was the approach adopted by SIL2Linux. Yet they failed.
> > I'd like to understand why.
>
> I have some understanding of why, but it is not my story to tell.
>

I would love to know more, since IEC 61508-3 Route 3s seemed a sensible
route for SIL2Linux to take.

> Your post is insulting to certification authorities and those of us
> > who participate in standards committees.
>
> I'm sorry you feel that way. My aim was not to insult, but to draw
> attention to the gaps which clearly exist between the standards and the
> way modern software-intensive systems are actually engineered.
>

I'm glad that you did not aim to insult. There is a big gap between
safety-critical software that complies with the standards and other
software that does not. That's always been the case. Most software
developers care about features, cost and time-to-market. Safety-critical
software developers have to show that the system can be shown to be
acceptably safe before it enters service. I don't agree with your use of
the term 'modern'. The Airbus Flight Control System (FCS) and the Linux
kernel are both modern software-intensive systems. They've been developed
to meet very different goals and have therefore been engineered
differently.

> You imply that the
> > certification authorities are being unreasonable and that they should
> > just allow people to use Linux.
>
> I certainly did not say any such thing, nor did I intend to imply it.
> Given experience of single-threaded microcontroller type systems with
> expectations of deterministic behaviour, many of the ideas in the
> standards are entirely reasonable.
>
> When reasoning about software for a modern multi-core processor, with
> perhaps a million lines of (uncertified) code hidden as firmware, the
> behaviour is not deterministic, so test coverage will not get there.
>

Accepted, but that's why a million lines of uncertified code running on a
multi-core processor, whose behaviour is not deterministic, is not suitable
to be used in a safety application.

> > I don't agree with your conclusion. If
> > Linux is as complicated as you say, they need to do more verification,
> > not less.
>
> I did not mean to suggest any lack of verification; there is a huge
> amount of work done to verify every Linux release. They just don't set
> much store in measuring code coverage via unit tests.
>

This is the aspect I have difficulties with. I accept that the Linux
developers don't set much store in measuring code coverage. DO-178C doesn't
require that you achieve structural coverage through unit tests, by the
way. But if they do as much testing as you claim to verify every Linux
release, it should be straightforward for a 'Safe Linux' team to fill the
gaps to achieve compliance with the relevant standards. If the compliance
gaps are large and the Linux developers rely instead on the extremely large
user base reporting any defects quickly so they can be fixed in the next
release, then that isn't good enough for safety applications.

You talk a lot about "state of the art". Linux is a widely-respected,
mature operating system, but it is not "state of the art". I consider
formal methods to be "state of the art". Microsoft, Amazon Web Services and
Intel all have large formal methods teams. See, for example, the excellent
presentation by Rod Chapman, my former colleague from Praxis: Automated
Reasoning in and… | High Integrity Software Conference 2024
(his-conference.co.uk)
<https://www.his-conference.co.uk/session/automated-reasoning-in-and-about-the-cloud>.
There is even an open-source operating system microkernel that has been
verified using formal methods - Home | seL4 <https://sel4.systems/>

> I don't see how you can use statistical techniques to
> > measure confidence in software as complex as an operating system.
>
> Understood. Hopefully we and others will manage to demonstrate that in
> spite of the doubts.
>

Prof. Peter Ladkin is one of the World's leading experts on practical
statistical evaluation of critical software. Peter has pointed out that
using statistical techniques to measure confidence in software as complex
as an operating system is a pipe dream.

> I'm reminded of the quote from Tony Hoare, "There are two ways of
> > constructing a software design: One way is to make it so simple that
> > there are obviously no deficiencies, and the other way is to make it
> > so complicated that there are no obvious deficiencies. The first
> > method is far more difficult". I'm also reminded of the quote from the
> > NTSB report on one of the Tesla accidents, "Just because you can
> > doesn't mean you should".
>
> Indeed. Thank you again for your comments.
>

Thank you for your taking the time to respond to my comments.

Yours,

Dewi Daniels | Director | Software Safety Limited

Telephone +44 7968 837742 | Email dewi.daniels at software-safety.com

Software Safety Limited is a company registered in England and Wales.
Company number: 9390590. Registered office: Fairfield, 30F Bratton Road,
West Ashton, Trowbridge, United Kingdom BA14 6AZ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/pipermail/systemsafety/attachments/20240806/acdbd406/attachment-0001.html>