[SystemSafety] An LLM generated C compiler
Derek M Jones
derek at knosof.co.uk
Tue Feb 24 11:56:06 CET 2026
All,
An LLM has generated a C compiler.
This is interesting because it involves connecting many
complicated moving parts and is relatively large (100K LOC).
How does the compiler perform?
It performs well when handling correct C source, and the
generated code works (for my small test cases).
It does very poorly detecting semantically incorrect C.
This is not surprising given that the bulk of the publicly
available C source it was trained does not contain semantic errors.
At least one of the optimization does nothing (based on a few basic
tests), even though code claiming to perform the optimization exists.
Figuring out that the optimization is not happening is a very different
skill set from implementing the algorithm to perform it.
While the compiler source has been released, the prompts used have not
been released. I suspect that the prompts contain an awful lot of
detailed instructions written by somebody who knows lots about compilers.
More analysis here
https://shape-of-code.com/2026/02/22/investigating-an-llm-generated-c-compiler/
--
Derek M. Jones Evidence-based software engineering
blog:https://shape-of-code.com
More information about the systemsafety
mailing list