[SystemSafety] Bugs in LLM generated proofs

Fri Feb 13 11:58:00 CET 2026

Paul,

>> The main reason I prefer Deepseek and Kimi for solving maths
>> problems is that they provide chain-of-thought.  So I can see
>> how they have interpreted my question (not always how I intended),
>> and the simplifications they make (not always applicable).
> 
> Are you confident that the provided chain of thought actually aligns with the path the model has followed [1]?

As in chain-of-thought (CoT) says "1+1=1" (happened to me once)
and on then acts as-if the answer is 2.

I have not paid particular attention to this.
CoT is actually a tree, but the LLM output does not always make
the tree structure explicit.  So the "1+1=1" branch may have
terminated, with the model going down a "1+1=2" branch without
reporting anything.

Following a CoT is much more like following an argument on a
whiteboard, which tends to jump to different parts of the board.
Sometimes CoT does look like an derivation that appears in a
book or paper.

> [1] https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf

This is one of those "AI with evil intent might hide important details"
papers.

If you work on the Alignment Science Team, then your job
depends on believing that LLMs are not just sophisticated
token predictors.
A great analysis on this thread
https://x.com/sebkrier/status/2020561261751062664

-- 
Derek M. Jones           Evidence-based software engineering
blog:https://shape-of-code.com