[SystemSafety] State of the art for "safe Linux"

Tue Aug 6 21:47:43 CEST 2024

Paul Sherwood wrote:

“I understand the argument, but this last sentence is flawed.

The fact that some statements are not covered by tests does not necessarily
mean that they have never been tested. For example the Linux community
benefits from the fact that their software is tested on billions of
devices every day.”

Sorry, I’m not buying that argument. Counter-example? Heartbleed. It was reported to have been introduced into OpenSSL's TLS in 2012 and apparently not discovered by “the good guys” until 2014. That no “normal” code triggered the buffer overrun certainly did not at all mean that the buffer overrun defect wasn’t there. Who knows how many times that defect was maliciously exploited before it was finally discovered by "the good guys” and eliminated?

And Heartbleed was in software that WAS supposedly tested—the issue being that the testing that was done was obviously inadequate to reveal that defect.

Given the rank amateur-ish manner in which the vast majority of software is produced these days—particularly in the open source community—there can be no substitute for truly comprehensive testing to have any trust at all in that code.

— steve

On Aug 6, 2024, at 10:35 AM, Paul Sherwood <paul.sherwood at codethink.co.uk> wrote:

Hi Dewi

Please see my comments inline...

On 2024-08-06 14:27, Dewi Daniels wrote:
What do you mean by Linux, here? If you are meaning "the people who
develop Linux", then I think the reason is that they have been able
to
achieve their goals for the software without compliance. In almost
all
cases the developers will not have needed to consider those specific
standards at any point during their careers.
So, the problem is not that Linux cannot meet the relevant standards,
but that there is no motivation for the Linux developers to comply
with the standards?

In general that is correct, yes. And actually this concern will be
compounded (or perhaps even detonated) by the European Cyber Resilience
Act [1].

Surely, if Linux is to be used for safety
applications, there needs to be a 'Safe Linux' branch that complies
with the relevant standards.

I believe that some organisations are still attempting that, but doubt
that they will succeed.

<snip>
I accept that before I started writing safety-critical software, I
didn't necessarily test every single line of code. Dave Thomas, who I
used to work for, says in his presentation Agile is Dead • Pragmatic
Dave Thomas • GOTO 2015 (youtube.com) [1] at 18:00 that he doesn't
test everything. However, there is a big difference between safety
applications and other kinds of applications. Even IEC 61508 SIL 1 or
DO-178C Level D is a very strong claim. Since I started writing
safety-critical software, I have tended to test everything just
because it isn't that hard and doesn't take that much time.

I agree that everything should be tested.

DO-178C doesn't require you to achieve 100% structural coverage
through unit tests. It requires that your requirements-based tests
(which can be hardware/software integration tests, software
integration tests or low-level tests) cover all the software
requirements and exercise all the code. If your tests haven't covered
all the requirements, there's functionality you haven't tested. If
your tests haven't achieved statement coverage, then there's code that
you've never executed, not even once, during your testing.

I understand the argument, but this last sentence is flawed.

The fact that some statements are not covered by tests does not necessarily
mean that they have never been tested. For example the Linux community
benefits from the fact that their software is tested on billions of
devices every day.

Moreover, even in the dwindling population of systems where folks are
still aiming for 100% coverage, there will still be bugs and issues.
Coverage does not guarantee anything.

<snip>
Irrespective of personal preference, I would expect that if Red Hat
chooses to increase test coverage on any project, they are entirely
capable of doing so.
You claim they are capable of doing so, but they choose not to. It
comes back to the point above that they don't see a commercial
incentive to create a DO-178C or IEC 61508 variant of Red Hat Linux.

I can't comment further - I have no affiliation with Red Hat.

<snip>
There are open-source repositories that are suitable for use in
critical systems. One example is AdaCore's GNAT Ada compiler. Most
open-source projects (and I suspect most proprietary software
projects), even those sponsored by large companies, do surprisingly
little testing. For example, I reviewed part of the ARM Trusted
Computing Platform on behalf of the CHERI Standards Compliance (STONE)
project CHERI+STONE+-+Final+Report+-+for+publication+v1.0.pdf
(squarespace.com) [2]. The System Control Processor (SCP) firmware
consisted of 381,512 SLOC, but there were only 5,291 SLOC of tests.
There are very few open-source projects that do enough testing to
achieve even DO-178C Level D or IEC 61508 SIL 1.

I believe you.

<snip>
As stated, previous initiatives have failed to achieve
certification,
but this does not mean that Linux has not been deployed in critical
systems.
How was it deployed in critical systems if it was not certified?

Again, not my stories to tell. But please accept that I haven't come
here to make false statements.

Route 2s: proven in use. This is not considered practicable for
complex software such as operating systems. This was recognised by
SIL2Linux.
At the risk of seeming 'insulting' again, it seems to me that the
proven
in use path is practically impossible for anything involving modern
electronics, even before we get to the software.
Agreed

Excellent :-)

Route 3s assessment of non-compliant development. This is a very
sensible way of allowing the use of an open-source operating
system
such as Linux (more so than the COTS guidance in DO-278A). Route
3s
can be summarised as a. what do we know about the software? b.
what
evidence do we have that it works? and c. what happens if it goes
wrong? This was the approach adopted by SIL2Linux. Yet they
failed.
I'd like to understand why.
I have some understanding of why, but it is not my story to tell.
I would love to know more, since IEC 61508-3 Route 3s seemed a
sensible route for SIL2Linux to take.

I agree.

<snip>
I'm glad that you did not aim to insult. There is a big gap between
safety-critical software that complies with the standards and other
software that does not. That's always been the case. Most software
developers care about features, cost and time-to-market.
Safety-critical software developers have to show that the system can
be shown to be acceptably safe before it enters service. I don't agree
with your use of the term 'modern'. The Airbus Flight Control System
(FCS) and the Linux kernel are both modern software-intensive systems.
They've been developed to meet very different goals and have therefore
been engineered differently.

I agree with everything you've said here.

Nonetheless, I believe that some open source software including the Linux
kernel can be shown to be suitable for use in safety-critical systems,
notwitstanding the authors' chosen processes and testing strategies.

When reasoning about software for a modern multi-core processor,
with
perhaps a million lines of (uncertified) code hidden as firmware,
the
behaviour is not deterministic, so test coverage will not get there.
Accepted, but that's why a million lines of uncertified code running
on a multi-core processor, whose behaviour is not deterministic, is
not suitable to be used in a safety application.

And yet, folks are doing that, and adding AI. What can possibly go wrong?

<snip>
I did not mean to suggest any lack of verification; there is a huge
amount of work done to verify every Linux release. They just don't
set
much store in measuring code coverage via unit tests.
This is the aspect I have difficulties with. I accept that the Linux
developers don't set much store in measuring code coverage. DO-178C
doesn't require that you achieve structural coverage through unit
tests, by the way. But if they do as much testing as you claim to
verify every Linux release, it should be straightforward for a 'Safe
Linux' team to fill the gaps to achieve compliance with the relevant
standards.

Time passes, but this has not been achieved.

If the compliance gaps are large and the Linux developers
rely instead on the extremely large user base reporting any defects
quickly so they can be fixed in the next release, then that isn't good
enough for safety applications.

Can we maybe get back to the original point, with this? What is "good enough"?

- Software in practice on multicore processors exhibits stochastic behaviours
 (given your experience, I would be surprised if you actually challenge this?)

- So we have to measure variances, and then the question is what tolerances
 should we be aiming for?

You talk a lot about "state of the art".

That's because I am attempting to clarify what it is, in preparation for
safety assessments.

Linux is a widely-respected,
mature operating system, but it is not "state of the art".

Hmmm.

I consider
formal methods to be "state of the art".

I consider formal methods rather impractical, and still very much
a minority sport, more than 40 years after I was originally exposed
to them.

But yes, state of the art, of a kind.

Microsoft, Amazon Web
Services and Intel all have large formal methods teams. See, for
example, the excellent presentation by Rod Chapman, my former
colleague from Praxis: Automated Reasoning in and… | High Integrity
Software Conference 2024 (his-conference.co.uk) [3]. There is even an
open-source operating system microkernel that has been verified using
formal methods - Home | seL4 [4]

sel4 is great, but as far as I know the verification doesn't extend to
drivers, and the behaviour may still be compromised by unexpected
chip/firmware (at least, I remember reading a contrite blog post by the
sel4 team when they were caught out by speculative execution along with
everyone one else). I may be wrong of course.

Prof. Peter Ladkin is one of the World's leading experts on practical
statistical evaluation of critical software. Peter has pointed out
that using statistical techniques to measure confidence in software as
complex as an operating system is a pipe dream.

Yes, this is all true. I don't agree with you, or with him. We shall see,
hopefully before I retire or drop off entirely.

best wishes

Paul

_______________________________________________
The System Safety Mailing List
systemsafety at TechFak.Uni-Bielefeld.DE
Manage your subscription: https://lists.techfak.uni-bielefeld.de/mailman/listinfo/systemsafety

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/pipermail/systemsafety/attachments/20240806/83d49566/attachment-0001.html>