[SystemSafety] CbyC and unit testing

Fri Jul 3 16:20:29 CEST 2020

All,

I've been exploring this issue offline with others on the list but now 
have to attend to a pressing commitment elsewhere. Because I think the 
issue is very important, I'll summarise my objections to omission of 
unit-testing (UT) here and leave others to shoot at them (especially Rod 
Chapman, who knows a lot more about CbyC in iFacts than I do):

1. As I understand it, PBL's position is, in the symbolism of 
(classical) modal logic: (Use-CbyC -> ¬[]Perform-UT). That is, if you 
use CbyC development methods it is not necessary to do unit testing.

     My position is: ¬[](Use-CbyC -> ¬[]Perform-UT). That is to say I do 
not think it necessarily the case that use of CbyC renders UT unnecessary.

     I'll leave any quibbles about alethic or epistemic interpretation 
of modal logic to others.

2. I do not believe that experience of CbyC on the Altran iFacts Air 
Traffic Control (ATC) project is sufficient to sustain the claim the 
CbyC renders UT unnecessary. My reasons for this include:

     (a) The aim of ATC is to maintain safe separation of aircraft. Yet 
the Z definitions of the UK airspace used by iFacts specified it as a 
set of non-convex volumes intersecting only at their boundaries. These 
definitions did not even give a specific topology, still less define a 
metric space. In fact the Z definitions did not even contain the 
geographic coordinates of the vertices at sector boundaries. In these 
circumstances, how can one claim that the notions of location and 
distance between aircraft are even defined by the Z specifications? A 
coder of a software unit that dealt with location and distance would, 
presumably, have to introduce the relevant coordinates and metric 
notions into the SPARK proof annotations? For if they were not given in 
the Z functional specification, how else could they be introduced? And 
if the coordinate system and distance metric were added by the coder for 
the units that he was coding, how were these notions explicitly 
traceable to the Z specifications? In fact, given the weakness of the Z 
airspace specifications, how could incorrect location and metric 
assumptions possibly contradict it? If the formal functional spec was 
that weak, CbyC was correspondingly limited in the strength of the 
assertions that it could prove about the code - and that is quite apart 
from any issues of whether UT is necessary.

(b) The iFacts code was not all written in SPARK Ada. AFAI recall (Rod C 
may correct me) there were about 100kloc of SPARK Ada and 80kloc of C. 
The C bits were not, AFAI am aware, developed using a CbyC process. In 
my recollection the most powerful static analysis that was performed on 
them was MISRA C conformance checking (done by QAC with yours truly 
examining the results). MISRA conformance was subject to a large number 
of deviation concessions because the C code used X libraries, which are 
not fully MISRA C compliant. If you have CbyC for only 56% of the code, 
how can you claim that CbyC has rendered unit-testing unnecessary? The 
closest you can get is to provide corresponding input and output 
predicates for calls to C functions from Ada. But there is no white-box 
proof that a called C routine actually behaves as its input-output 
predicate specification claims. In what sense, then, could anyone 
reasonably claim that CbyC rendered UT unnecessary for the C code?

(c) Location and distance calculation necessarily involves computation 
with trigonometric functions. How was it assured that these produced 
correct results close to bearings at which these functions take the 
value zero? DId the Z spec take account of problems of stability in the 
numerical calculations? Did they, for example, use notions of interval 
arithmetic? Were the relevant verification conditions placed in the 
SPARK annotations? If not, how did the coders provide for numerical 
stability and make their provisions traceable to the Z spec?

(d) The iFacts system is highly concurrent. Did the CbyC provide for 
verification of relevant concurrency properties? I didn't look in detail 
at all of the Z specification, but I'd seriously question the capacity 
of Z to capture the relevant requirements in its action-system style of 
modelling (but, please, RC, correct me if I've got this wrong).

(e) AFAI I gather from RC, CbyC in iFacts did only *partial* correctness 
proofs. In such circumstances, UT of code with profiling instrumentation 
may provide the first opportunity to detect non-termination.

3. Upon what properties of the elements of the tool chain is CbyC relying?

(a) On compiler correctness? Was the compiler developed using CbyC? Does 
the language standard even provide a basis for CbyC development of the 
compiler? Does one know that the compiler has been tested under the 
particular compile-time options that one's project is actually using (a 
fearful bear trap with, for example, cross compilers for embedded C)? 
How does one ensure, with SPARK Ada for example, that the compiler does 
not do perverse things such as compiling iteration as recursion - which, 
AFAI can make out from what RC says, the Ada RM may not actually 
prohibit, again raising issues of traceability?

(b) On the IDE used? Has it undergone CbyC development? How thoroughly 
has it been tested? How does it maintain correct configuration of the 
tools to which it provides access? I've seen some truly horrendous 
blunders in C tool configuration. IMHO, tool configuration is a grossly 
neglected area of the application of formal methods.

(c) Unless the relied-upon properties of toolchain elements are made 
explicit, it is hard to sustain the claim that CbyC is even being done 
correctly, leave alone that it is obviating testing.

4. Is CbyC based on a formal spec that captures relevant non-functional 
(NF) requirements? If not, then UT may the earliest opportunity to 
detect that they are not. Examples include;

(a) speed of operation of a coded unit: profiling UT will determine this 
very easily,

(b) satisfactory interoperation of a coded unit with other pre-existing 
system elements: if not checkable strictly at the UT level, this can be 
checked by minimal integration tests.

(c) usability of a coded unit: AFAI can see formal methods are quite a 
long way off being practically helpful in getting HMI right. I'd *love* 
to see a model of human cognition in Z! :-O

(d) security of a coded unit: I've yet to see a formal specification of 
software that would allow one to prove that the security of a wired 
connection is inferior to that of an optical connection because of the 
possibility of EM emanation. If anyone here has, say in Z, a formulation 
of Maxwell's equation and a quantum electrodynamical model of light 
propagation in optical fibres, I'd really love to see it.

To summarise the foregoing, I say:

(a) We cannot ignore the limited universe problem: Formal methods do not 
allow us to prove anything about details from which they abstract. UT, 
being the sordidly practical, unabstracted real thing, gives us the 
earliest opportunity to catch problems that may arise from over-abstract 
specifications, some aspects of which I have alluded to above.

(b) I would be prepared to concede that something like:

     ((Use-CbyC & P1 & ... & Pi ...) -> ¬[]Perform-UT) may be tenable 
where the Pi are comprehension premises that address the problems of 
tool chain integrity such as I have referred to above. I do not, 
however, believe that:

     (Use-CbyC -> ¬[]Perform-UT) is tenable as it stands. There are, 
IMHO, too many implicit things that can go wrong for it to be a safe 
claim. AFAI can see the only way that proposition is helpfully tenable 
is if you can explicitly prove CbyC -> Pi for each agreed Pi in my 
conceded formulation.

(c) I've never yet encountered a systems engineer who does not believe 
that the earlier you detect problems, the easier and cheaper they are to 
correct. I think that there are circumstances in which, even if CbyC has 
been used, UT is the earliest point in the life cycle at which problems 
not detectable by CbyC can be detected by other means. Does one really 
wish, by fiat, to throw this detection opportunity away?

Don't get me wrong. I'd like to see CbyC much more widely used. But I'd 
also like to see its practitioners avoiding the kind of ill-considered, 
hyping claims by the like of which the AI community has so often blown 
its own arse off.

Hope this provides more food for thought,

Heading for afternoon tea,

Olwen