[SystemSafety] Kolmogorov complexity measures
Olwen Morgan
olwen at phaedsys.com
Mon Nov 19 20:57:54 CET 2018
First a quote from WikiPedia:
"Inalgorithmic information theory
<https://en.wikipedia.org/wiki/Algorithmic_information_theory>(a
subfield ofcomputer science
<https://en.wikipedia.org/wiki/Computer_science>andmathematics
<https://en.wikipedia.org/wiki/Mathematics>), the*Kolmogorov
complexity*of an object, such as a piece of text, is the length of the
shortestcomputer program
<https://en.wikipedia.org/wiki/Computer_program>(in a
predeterminedprogramming language
<https://en.wikipedia.org/wiki/Programming_language>) that produces the
object as output. It is a measure of thecomputational
<https://en.wikipedia.org/wiki/Computation>resources needed to specify
the object, and is also known as*descriptive
complexity*,*Kolmogorov–Chaitin
<https://en.wikipedia.org/wiki/Gregory_Chaitin>complexity*,*algorithmic
complexity*,*algorithmic entropy*, or*program-size complexity*. It is
named afterAndrey Kolmogorov
<https://en.wikipedia.org/wiki/Andrey_Kolmogorov>, who first published
on the subject in 1963.^[1]
<https://en.wikipedia.org/wiki/Kolmogorov_complexity#cite_note-1> ^[2]"
<https://en.wikipedia.org/wiki/Kolmogorov_complexity#cite_note-2>
^(Wikipedia should be ashamed for not calling him, "Andrei Nikolaevich
Kolmogorov". Incorrect transliteration and omission of patronymics is
inexcusable.)
^Now a C program that attempts to obey in C the kinds of rules that
you'd have to follow in SPARK Ada. Look at the codings for gcd1 and gcd2:
^
#include <stdio.h>
#include <assert.h>
/* function to compute gcd of two positive integers a, b */
int gcd1(int a, int b)
{
/* note the single-assignment style:
*
* (1): p, q, r are all defined in the smallest possible scope,
* (2): p, q, r are all initialised at declaration, and
* (3): each of p, q, r is modified in exactly one place in their
respective scopes.
*
* These usages are intended to make the code easy to analyse.
*/
int p = a;
int q = b;
int r = p%q;
while (r > 0) /*The local loop invariant can be written down at
sight from this code */
{
p = q;
q = r;
r = p%q;
}
/* Note: it doesn't matter if a < b when the function is called,
since the
* first iteration through the loop will just swap them and then
carry on
*/
return q;
}
/* recursive version: shorter than gcd1 but not admissible
* in circumstances where recursive coding is prohibited
*/
int gcd2(int a, int b)
{
int r = a%b;
return (r == 0 ? b : gcd2(b, r));
}
int main (void)
{
int s = 35;
int t = 77;
assert(t != 0); /* halts if t == 0 to avoid division by 0 in
functions */
printf("gcd1(%i, %i) = %i\n", s, t, gcd1(s, t));
printf("gcd2(%i, %i) = %i\n", s, t, gcd2(s, t));
return 0;
}
Obviously the recursive implementation would be banned in critical code
to ensure that storage requirements are compile-time computable. So gcd2
will not be regarded as acceptable for estimating Kolmogorov complexity.
BUT
can anyone come up with a shorter one than gcd1 that complies with
reasonable C language subset requirements for critical systems development?
I'm guessing that it's reasonable to measure length in terms of the
number of lexical tokens (otherwise you'd have to regard as more complex
an implementation that used longer variable names). If you used the same
algorithm as gcd1 in SPARK Ada, would it be longer based on lexeme count?
Just an amusing diversion for techies,
Olwen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.techfak.uni-bielefeld.de/mailman/private/systemsafety/attachments/20181119/decb7b11/attachment-0001.html>
More information about the systemsafety
mailing list