[SystemSafety] CbyC and unit testing

Tue Jul 7 16:43:24 CEST 2020

On 06/07/2020 19:36, Olwen Morgan wrote:

> I regard the notion of "rigorous inductive proof" as meaningless in
> this context. I have difficulty with the application of the term
> "rigorous" to testing since it invites, IMO, a false comparison with
> the rigour with which CbyC proofs are carried out. My own personal
> practice is, as I have said, to code my units such that any set of
> tests that achieves 100% strong, robust, boundary value coverage
> (taking both formally-referenced-black-box and white-box-internal data
> into account) also achieves 100% simple path coverage. This is to
> assure a very high-level of relevant coverage in testing and typically
> involves a pseudo-single-assignment style in which every variable is
> declared in the smallest possible scope, is initialised at declaration
> and is threatened by assignment in at most one point in its scope. It
> tends to resemble functional programming in structure and is indeed
> intended to do so on the assumption (possibly unwarranted) that it
> tends to make life easier for any prover. This is a major
> consideration in C where, if you are striving for best attainable
> practice, you might be using several different verification tools.
> AFAI can see, this is a far stronger requirement than I've ever heard
> anyone else use or advocate. If this was what CARH was getting at in
> the notion of "rigorous inductive proof", then that's what I'm doing.
> I reached this practice as a result of developing a somewhat
> technically paranoid mindset when working in C system development. 

Here's the extract from the 1969 NATO Software Engineering conference
where Tony Hoare made that remark. Hoare is followed by a sceptical
Perlis. It's helpful that Hoare, Dijkstra and others are able to join in
our discussions so pertinently, half a century later, though I find it
profoundly depressing that the same discussion is still needed.

Martyn

Hoare: One can construct convincing proofs quite readily of the ultimate
futility of exhaustive testing of a program and even of testing by
sampling. So how can one proceed? The role of testing, in theory, is to
establish the base propositions of an inductive proof. You should
convince yourself, or other people, as firmly as possible that if the
program works a certain number of times on specified data, then it will
always work on any data. This can be done by an inductive approach to
the proof. Testing of the base cases could sometimes be automated. At
present, this is mainly theory; note that the tests have to be designed
at the same time as the program and the associated proof is a vital part
of the documentation. This area of theoretical work seems to show a
possibility of practical results, though proving correctness is a
laborious and expensive process. Perhaps it is not a luxury for certain
crucial areas of a program.

Perlis: Much of program complexity is spurious and a number of test
cases properly studied will exhaust the testing problem. The problem is
to isolate the right test cases, not to prove the algorithm, for that
follows after the choice of the proper test cases.

Dijkstra: Testing shows the presence, not the absence of bugs.

Lucas gave a brief summary of the Vienna method (see P. Lucas and K
Walk: “On the formal description of PL/1” Annual Review in Automatic
Programming, Vol. 6). He gave an example of an application of the Vienna
method, which revealed an error in the mechanism for dynamic storage
allocation of AUTOMATIC variables in the F-level PL/I compiler (see P.
Lucas: “Two constructive realizations of the block concept and their
equivalence” TR 25.085; also W. Hanahpl: “A proof of correctness of the
reference mechanism to automatic variables in the F-Compiler” LN
25.3.048; both of which are available from the IBM Vienna Laboratory).

Lucas: The error was not found by the compiler writers; it was not found
by product test and it would not easily have been found by any random
tests. I am quite convinced that making this proof was cheaper than the
discussions I have heard among highly-paid people on whether or not this
allocation mechanism would cover the general case.

22

Galler: I think this relates to a common mathematical experience; when
you cannot prove a theorem very often the sticking-point is the source
of the counter-example.

Llewellyn: How often in practice do these obscure errors turn up?

Lucas: Nobody knows how often these cases turn up; that is just the
point. A compiler writer is not a prophet of all computer users.
Secondly, even if only one user meets the error it causes him a real
problem and it causes a full system update mechanism to go into action.
Even for only one such case, it is cheaper to make the proof than mend
the bug.

Galler: Engineers use the “worst case” approach to testing. Has this
been done in software?

Lucas: I can’t see any possibility of specifying worst cases in this
kind of field.

Reynolds: The difficulty with obscure bugs is that the probability of
their appearance increases with time and with wider use.

Dijkstra described some basic features of his work over the past several
years on the problem of making programming an order of magnitude easier
and more effective. This is described in full in his working paper in
section 7.4.

Perlis: Do you think the methods you have developed can be taught to
those programmers who have always coded first and justified last?

Dijkstra: No one survives early education without some scars. In my
experience, the intellectually degrading influence of some educational
processes is a serious bar to clear thinking. It is a requirement for my
applicants that they have no knowledge of FORTRAN. I am not being
facetious.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/pipermail/systemsafety/attachments/20200707/31043e7b/attachment.html>