[SystemSafety] Fwd: The evidence base

Fri Oct 26 12:02:26 CEST 2018

To get evidence of whether particular practices promote software 
dependability, one needs to take measurements. This immediately brings 
us up against the problem of a non-existent metrological base. Here's a 
very simple example:

Back in the 1990s I was involved in a project to migrate 10m lines of 
telecomms network management code from a 32-bit to a 64-bit platform. 
The question arose of whether the existing regression tests would be 
sufficient detect errors after the migration. To examine the problem, I 
and a colleague ran the code through three tools to extract a set of 
testability metrics. The tools were QAC, CANTATA++ and the McCabe toolset.

The McCabe cyclomatic complexity was taken as a basic testability 
yardstick because it gave a very coarse lower bound on how big an 
adequate set of test cases should be. (It turned out that the existing 
regression test set had nowhere near as many tests as you'd need even to 
get full coverage of all linearly independent paths.)

More worrying, though, was that the three tools did not actually agree 
on what the measured values for cyclomatic complexity were. McCabe's 
tools computed it according to his original definition. Cantata computed 
the Myers modification and QAC gave the original cyclomatic complexity 
and the Myers correction separately. AFAI recall, there were also 
further odd discrepancies here and there across the analysed code base.

This points to an obvious underlying problem. Whereas there are 
standards of measurement for physical quantities, there are no standards 
of measurement for software metrics. To make such standard is actually 
not difficult. You simply use a syntax-based metanotation to define for 
a language how the values of measurements are defined for programs in 
that language. At the time I devised such a metanotation and proposed a 
project to write a tool to use it to produce at least local reference 
realisations of standards for the metrics used. It didn't happen and I 
shelved the idea.

The problem is worse for measuring such thing such as project effort 
where several confounding factors can skew basic timesheet-based 
measurements.

The more I see of attempts to justify practice A as promoting 
dependability property B, the more disillusioned I get. There is no hope 
of sound findings without sound practices of measurement and a 
supporting system of metrology. In this respect, just as economics is 
the dismal science, so software development is the dismal engineering.

Olwen