[SystemSafety] State of the art for "safe Linux"

Tue Aug 6 19:35:01 CEST 2024

Hi Dewi

Please see my comments inline...

On 2024-08-06 14:27, Dewi Daniels wrote:
>> What do you mean by Linux, here? If you are meaning "the people who
>> develop Linux", then I think the reason is that they have been able
>> to
>> achieve their goals for the software without compliance. In almost
>> all
>> cases the developers will not have needed to consider those specific
>> 
>> standards at any point during their careers.
> 
> So, the problem is not that Linux cannot meet the relevant standards,
> but that there is no motivation for the Linux developers to comply
> with the standards?

In general that is correct, yes. And actually this concern will be
compounded (or perhaps even detonated) by the European Cyber Resilience
Act [1].

> Surely, if Linux is to be used for safety
> applications, there needs to be a 'Safe Linux' branch that complies
> with the relevant standards.

I believe that some organisations are still attempting that, but doubt
that they will succeed.

<snip>
> I accept that before I started writing safety-critical software, I
> didn't necessarily test every single line of code. Dave Thomas, who I
> used to work for, says in his presentation Agile is Dead • Pragmatic
> Dave Thomas • GOTO 2015 (youtube.com) [1] at 18:00 that he doesn't
> test everything. However, there is a big difference between safety
> applications and other kinds of applications. Even IEC 61508 SIL 1 or
> DO-178C Level D is a very strong claim. Since I started writing
> safety-critical software, I have tended to test everything just
> because it isn't that hard and doesn't take that much time.

I agree that everything should be tested.

> DO-178C doesn't require you to achieve 100% structural coverage
> through unit tests. It requires that your requirements-based tests
> (which can be hardware/software integration tests, software
> integration tests or low-level tests) cover all the software
> requirements and exercise all the code. If your tests haven't covered
> all the requirements, there's functionality you haven't tested. If
> your tests haven't achieved statement coverage, then there's code that
> you've never executed, not even once, during your testing.

I understand the argument, but this last sentence is flawed.

The fact that some statements are not covered by tests does not 
necessarily
mean that they have never been tested. For example the Linux community
benefits from the fact that their software is tested on billions of
devices every day.

Moreover, even in the dwindling population of systems where folks are
still aiming for 100% coverage, there will still be bugs and issues.
Coverage does not guarantee anything.

<snip>
>> Irrespective of personal preference, I would expect that if Red Hat
>> chooses to increase test coverage on any project, they are entirely
>> capable of doing so.
> 
> You claim they are capable of doing so, but they choose not to. It
> comes back to the point above that they don't see a commercial
> incentive to create a DO-178C or IEC 61508 variant of Red Hat Linux.

I can't comment further - I have no affiliation with Red Hat.

<snip>
> There are open-source repositories that are suitable for use in
> critical systems. One example is AdaCore's GNAT Ada compiler. Most
> open-source projects (and I suspect most proprietary software
> projects), even those sponsored by large companies, do surprisingly
> little testing. For example, I reviewed part of the ARM Trusted
> Computing Platform on behalf of the CHERI Standards Compliance (STONE)
> project CHERI+STONE+-+Final+Report+-+for+publication+v1.0.pdf
> (squarespace.com) [2]. The System Control Processor (SCP) firmware
> consisted of 381,512 SLOC, but there were only 5,291 SLOC of tests.
> There are very few open-source projects that do enough testing to
> achieve even DO-178C Level D or IEC 61508 SIL 1.

I believe you.

<snip>
>> As stated, previous initiatives have failed to achieve
>> certification,
>> but this does not mean that Linux has not been deployed in critical
>> systems.
> 
> How was it deployed in critical systems if it was not certified?

Again, not my stories to tell. But please accept that I haven't come
here to make false statements.

>>> Route 2s: proven in use. This is not considered practicable for
>>> complex software such as operating systems. This was recognised by
>>> SIL2Linux.
>> 
>> At the risk of seeming 'insulting' again, it seems to me that the
>> proven
>> in use path is practically impossible for anything involving modern
>> electronics, even before we get to the software.
> 
> Agreed

Excellent :-)

>>> Route 3s assessment of non-compliant development. This is a very
>>> sensible way of allowing the use of an open-source operating
>> system
>>> such as Linux (more so than the COTS guidance in DO-278A). Route
>> 3s
>>> can be summarised as a. what do we know about the software? b.
>> what
>>> evidence do we have that it works? and c. what happens if it goes
>>> wrong? This was the approach adopted by SIL2Linux. Yet they
>> failed.
>>> I'd like to understand why.
>> 
>> I have some understanding of why, but it is not my story to tell.
> 
> I would love to know more, since IEC 61508-3 Route 3s seemed a
> sensible route for SIL2Linux to take.

I agree.

<snip>
> I'm glad that you did not aim to insult. There is a big gap between
> safety-critical software that complies with the standards and other
> software that does not. That's always been the case. Most software
> developers care about features, cost and time-to-market.
> Safety-critical software developers have to show that the system can
> be shown to be acceptably safe before it enters service. I don't agree
> with your use of the term 'modern'. The Airbus Flight Control System
> (FCS) and the Linux kernel are both modern software-intensive systems.
> They've been developed to meet very different goals and have therefore
> been engineered differently.

I agree with everything you've said here.

Nonetheless, I believe that some open source software including the 
Linux
kernel can be shown to be suitable for use in safety-critical systems,
notwitstanding the authors' chosen processes and testing strategies.

>> When reasoning about software for a modern multi-core processor,
>> with
>> perhaps a million lines of (uncertified) code hidden as firmware,
>> the
>> behaviour is not deterministic, so test coverage will not get there.
> 
> Accepted, but that's why a million lines of uncertified code running
> on a multi-core processor, whose behaviour is not deterministic, is
> not suitable to be used in a safety application.

And yet, folks are doing that, and adding AI. What can possibly go 
wrong?

<snip>
>> I did not mean to suggest any lack of verification; there is a huge
>> amount of work done to verify every Linux release. They just don't
>> set
>> much store in measuring code coverage via unit tests.
> 
> This is the aspect I have difficulties with. I accept that the Linux
> developers don't set much store in measuring code coverage. DO-178C
> doesn't require that you achieve structural coverage through unit
> tests, by the way. But if they do as much testing as you claim to
> verify every Linux release, it should be straightforward for a 'Safe
> Linux' team to fill the gaps to achieve compliance with the relevant
> standards.

Time passes, but this has not been achieved.

> If the compliance gaps are large and the Linux developers
> rely instead on the extremely large user base reporting any defects
> quickly so they can be fixed in the next release, then that isn't good
> enough for safety applications.

Can we maybe get back to the original point, with this? What is "good 
enough"?

- Software in practice on multicore processors exhibits stochastic 
behaviours
   (given your experience, I would be surprised if you actually challenge 
this?)

- So we have to measure variances, and then the question is what 
tolerances
   should we be aiming for?

> You talk a lot about "state of the art".

That's because I am attempting to clarify what it is, in preparation for
safety assessments.

> Linux is a widely-respected,
> mature operating system, but it is not "state of the art".

Hmmm.

> I consider
> formal methods to be "state of the art".

I consider formal methods rather impractical, and still very much
a minority sport, more than 40 years after I was originally exposed
to them.

But yes, state of the art, of a kind.

> Microsoft, Amazon Web
> Services and Intel all have large formal methods teams. See, for
> example, the excellent presentation by Rod Chapman, my former
> colleague from Praxis: Automated Reasoning in and… | High Integrity
> Software Conference 2024 (his-conference.co.uk) [3]. There is even an
> open-source operating system microkernel that has been verified using
> formal methods - Home | seL4 [4]

sel4 is great, but as far as I know the verification doesn't extend to
drivers, and the behaviour may still be compromised by unexpected
chip/firmware (at least, I remember reading a contrite blog post by the
sel4 team when they were caught out by speculative execution along with
everyone one else). I may be wrong of course.

> Prof. Peter Ladkin is one of the World's leading experts on practical
> statistical evaluation of critical software. Peter has pointed out
> that using statistical techniques to measure confidence in software as
> complex as an operating system is a pipe dream.

Yes, this is all true. I don't agree with you, or with him. We shall 
see,
hopefully before I retire or drop off entirely.

best wishes

Paul