[SystemSafety] An LLM generated C compiler

Derek M Jones derek at knosof.co.uk
Tue Feb 24 11:56:06 CET 2026


All,

An LLM has generated a C compiler.
This is interesting because it involves connecting many
complicated moving parts and is relatively large (100K LOC).

How does the compiler perform?
It performs well when handling correct C source, and the
generated code works (for my small test cases).
It does very poorly detecting semantically incorrect C.
This is not surprising given that the bulk of the publicly
available C source it was trained does not contain semantic errors.

At least one of the optimization does nothing (based on a few basic
tests), even though code claiming to perform the optimization exists.
Figuring out that the optimization is not happening is a very different
skill set from implementing the algorithm to perform it.

While the compiler source has been released, the prompts used have not
been released.  I suspect that the prompts contain an awful lot of
detailed instructions written by somebody who knows lots about compilers.

More analysis here
https://shape-of-code.com/2026/02/22/investigating-an-llm-generated-c-compiler/

-- 
Derek M. Jones           Evidence-based software engineering
blog:https://shape-of-code.com



More information about the systemsafety mailing list