[SystemSafety] Bugs in LLM generated proofs
Paul Sherwood
paul.sherwood at codethink.co.uk
Fri Feb 13 10:54:00 CET 2026
Hi Derek
On 2026-01-14 16:23, Derek M Jones wrote:
> The main reason I prefer Deepseek and Kimi for solving maths
> problems is that they provide chain-of-thought. So I can see
> how they have interpreted my question (not always how I intended),
> and the simplifications they make (not always applicable).
Are you confident that the provided chain of thought actually aligns
with the path the model has followed [1]?
br
Paul
[1]
https://assets.anthropic.com/m/71876fabef0f0ed4/original/reasoning_models_paper.pdf
More information about the systemsafety
mailing list