[SystemSafety] State of the art for "safe Linux"

Mon Aug 5 19:32:50 CEST 2024

Hi Dewi,

Thank you for your feedback. Please see my comments inline below...

On 2024-08-05 15:42, Dewi Daniels wrote:
> You're missing the Enabling Linux in Safety Applications (ELISA)
> project ELISA - Advancing Linux in Safety-Critical Systems – ELISA
> [1] and EB corbos Linux for Safety Applications EB corbos Linux for
> Safety Applications – Elektrobit [2].

I skipped ELISA because as far as I know there are no concrete findings 
or cited research from ELISA to date; I may be wrong of course. My 
understanding is that the EB project concentrates on demonstrating 
safety via a hypervisor, i.e. the Linux component is treated as QM.

> I agree with Andrew Banks when he asks what do you mean by Linux?

I hope my response to Andrew was clear, but to elaborate, Linux 
officially means just the kernel. In practice for any real system 
requires a boot loader, drivers, init process etc, as well as the 
various tools and libraries required to compile it. People commonly use 
the term 'Linux' to refer to a whole operating system, comparing to e.g. 
MacOS or Windows, but this is technically incorrect. Some folks refer to 
GNU/Linux as a way of indicating the non-kernel components. I often use 
the phrase "Linux-based OS". As I said in my response to Andrew, from a 
safety perspective we need to consider the whole supply chain.

> What do you mean by certification authorities? Do you mean aviation,
> rail, automotive, nuclear, medical, process control? The regulatory
> regimes are very different.

I mean organisations that offer certification services, normally based 
on standards. Examples would be the TüV organisations, exida, UL.

> I don't understand why it's so hard for Linux to just comply with
> standards such as RTCA DO-178C/EUROCAE ED-12C? DO-178C Level D is just
> Software Engineering 101.

What do you mean by Linux, here? If you are meaning "the people who 
develop Linux", then I think the reason is that they have been able to 
achieve their goals for the software without compliance. In almost all 
cases the developers will not have needed to consider those specific 
standards at any point during their careers.

> Even Level C isn't that hard. Why wouldn't
> you want to achieve statement coverage?

Actually, I wouldn't, because of my personal bias. I've successfully 
delivered small amounts of critical code (in the few thousands of LoC 
range) without unit tests, for systems which then worked for years 
without problems. Usually I've had more success with system tests than 
unit tests, on many projects.

> I remember attending a talk by
> the CEO of Red Hat Linux UK, who admitted that most of their
> developers are volunteers and prefer to spend their time coding rather
> than writing tests.

While it is true that most or Red Hat's engineers contribute voluntarily 
to open source projects in their spare time, it is also true that Red 
Hat pays some thousands of engineers to work full-time on contributing 
to the Linux kernel and a huge range of open source projects. 
Irrespective of personal preference, I would expect that if Red Hat 
chooses to increase test coverage on any project, they are entirely 
capable of doing so.

> The open-source repositories that I've inspected
> contain alarmingly few tests.

Well, I think there are hundreds of millions of open source repositories 
on GitHub, and I'm sure almost all of them are clearly not suitable for 
use in critical systems. But there are some thousands of open source 
projects that have been developed and maintained to extremely high 
standards.

I would note that these days even proprietary (and certified) programs 
rely heavily on open source software. Microsoft ships Linux with every 
version of Windows. QNX is compiled with GCC.

> Another approach is to follow the COTS guidance in RTCA
> DO-278A/EUROCAE ED-109A. It was specifically written to allow the use
> of COTS software such as operating systems. I understand that
> EUROCONTROL and NATS have been deploying CNS/ATM systems based on UNIX
> for many years. RTCA SC-240/EUROCAE WG-117 is working on better
> guidance on the use of COTS and Open-Source Software (OSS) in
> aviation.

My recent work has been confined to the automotive industry, so I have 
no direct experience with the aerospace standards, but I'm pleased to 
learn this.

> IEC 61508 provides three compliance routes:
> 
> 	*
> 
> Route 1s: compliant development. Why is it so hard for Linux to just
> comply with IEC 61508? The requirements for SC1 or SC2 are not very
> onerous.

As previously stated, the people developing the Linux kernel in general 
have little/no experience (or need to comply) with any specific 
standard. Red Hat is actively working on safety for their RHIVOS 
product, focusing on ISO 26262. Perhaps they will achieve compliance, 
but I suspect there may be a lot of 'tailoring'.

As stated, previous initiatives have failed to achieve certification, 
but this does not mean that Linux has not been deployed in critical 
systems.

> Route 2s: proven in use. This is not considered practicable for
> complex software such as operating systems. This was recognised by
> SIL2Linux.

At the risk of seeming 'insulting' again, it seems to me that the proven 
in use path is practically impossible for anything involving modern 
electronics, even before we get to the software.

> Route 3s assessment of non-compliant development. This is a very
> sensible way of allowing the use of an open-source operating system
> such as Linux (more so than the COTS guidance in DO-278A). Route 3s
> can be summarised as a. what do we know about the software? b. what
> evidence do we have that it works? and c. what happens if it goes
> wrong? This was the approach adopted by SIL2Linux. Yet they failed.
> I'd like to understand why.

I have some understanding of why, but it is not my story to tell.

> Your post is insulting to certification authorities and those of us
> who participate in standards committees.

I'm sorry you feel that way. My aim was not to insult, but to draw 
attention to the gaps which clearly exist between the standards and the 
way modern software-intensive systems are actually engineered.

> You imply that the
> certification authorities are being unreasonable and that they should
> just allow people to use Linux.

I certainly did not say any such thing, nor did I intend to imply it. 
Given experience of single-threaded microcontroller type systems with 
expectations of deterministic behaviour, many of the ideas in the 
standards are entirely reasonable.

When reasoning about software for a modern multi-core processor, with 
perhaps a million lines of (uncertified) code hidden as firmware, the 
behaviour is not deterministic, so test coverage will not get there.

> I don't agree with your conclusion. If
> Linux is as complicated as you say, they need to do more verification,
> not less.

I did not mean to suggest any lack of verification; there is a huge 
amount of work done to verify every Linux release. They just don't set 
much store in measuring code coverage via unit tests.

> I don't see how you can use statistical techniques to
> measure confidence in software as complex as an operating system.

Understood. Hopefully we and others will manage to demonstrate that in 
spite of the doubts.

> I'm reminded of the quote from Tony Hoare, "There are two ways of
> constructing a software design: One way is to make it so simple that
> there are obviously no deficiencies, and the other way is to make it
> so complicated that there are no obvious deficiencies. The first
> method is far more difficult". I'm also reminded of the quote from the
> NTSB report on one of the Tesla accidents, "Just because you can
> doesn't mean you should".

Indeed. Thank you again for your comments.

br
Paul