[SystemSafety] Bugs in LLM generated proofs
Paul Sherwood
paul.sherwood at codethink.co.uk
Fri Feb 13 12:29:44 CET 2026
Thanks, Derek. That thread is very helpful.
On 2026-02-13 10:58, Derek M Jones wrote:
> Paul,
>
>>> The main reason I prefer Deepseek and Kimi for solving maths
>>> problems is that they provide chain-of-thought. So I can see
>>> how they have interpreted my question (not always how I intended),
>>> and the simplifications they make (not always applicable).
>>
>> Are you confident that the provided chain of thought actually aligns
>> with the path the model has followed [1]?
>
> As in chain-of-thought (CoT) says "1+1=1" (happened to me once)
> and on then acts as-if the answer is 2.
>
> I have not paid particular attention to this.
> CoT is actually a tree, but the LLM output does not always make
> the tree structure explicit. So the "1+1=1" branch may have
> terminated, with the model going down a "1+1=2" branch without
> reporting anything.
>
> Following a CoT is much more like following an argument on a
> whiteboard, which tends to jump to different parts of the board.
> Sometimes CoT does look like an derivation that appears in a
> book or paper.
>
>> [1]
>> https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf
>
> This is one of those "AI with evil intent might hide important details"
> papers.
>
> If you work on the Alignment Science Team, then your job
> depends on believing that LLMs are not just sophisticated
> token predictors.
> A great analysis on this thread
> https://x.com/sebkrier/status/2020561261751062664
More information about the systemsafety
mailing list