Reddit r/MachineLearning·29 May 2026

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

Signal

Hype

In three linesResearch on probe-targeted fine-tuning (LoRA) for verbal confidence calibration in LLMs. Models internally detect correct answers (0.76–0.88 AUROC) but output 99% confidence uniformly. Fine-tuning across 8 models (7B–70B) with causal activation patching (ρ=0.976). Code and pre-registration available.

Read source

Your take?

Fine-tuning Reasoning Alignment Evals Papers

Summary generated by Claude — human-verified

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

Other angles on this story