[SystemSafety] Bugs in LLM generated proofs

Paul Sherwood paul.sherwood at codethink.co.uk
Fri Feb 13 10:54:00 CET 2026


Hi Derek

On 2026-01-14 16:23, Derek M Jones wrote:
> The main reason I prefer Deepseek and Kimi for solving maths
> problems is that they provide chain-of-thought.  So I can see
> how they have interpreted my question (not always how I intended),
> and the simplifications they make (not always applicable).

Are you confident that the provided chain of thought actually aligns 
with the path the model has followed [1]?

br
Paul

[1] 
https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf


More information about the systemsafety mailing list