[SystemSafety] Bugs in LLM generated proofs

Fri Feb 13 12:29:44 CET 2026

Thanks, Derek. That thread is very helpful.

On 2026-02-13 10:58, Derek M Jones wrote:
> Paul,
> 
>>> The main reason I prefer Deepseek and Kimi for solving maths
>>> problems is that they provide chain-of-thought.  So I can see
>>> how they have interpreted my question (not always how I intended),
>>> and the simplifications they make (not always applicable).
>> 
>> Are you confident that the provided chain of thought actually aligns 
>> with the path the model has followed [1]?
> 
> As in chain-of-thought (CoT) says "1+1=1" (happened to me once)
> and on then acts as-if the answer is 2.
> 
> I have not paid particular attention to this.
> CoT is actually a tree, but the LLM output does not always make
> the tree structure explicit.  So the "1+1=1" branch may have
> terminated, with the model going down a "1+1=2" branch without
> reporting anything.
> 
> Following a CoT is much more like following an argument on a
> whiteboard, which tends to jump to different parts of the board.
> Sometimes CoT does look like an derivation that appears in a
> book or paper.
> 
>> [1] 
>> https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf
> 
> This is one of those "AI with evil intent might hide important details"
> papers.
> 
> If you work on the Alignment Science Team, then your job
> depends on believing that LLMs are not just sophisticated
> token predictors.
> A great analysis on this thread
> https://x.com/sebkrier/status/2020561261751062664