Back to feed
Reddit r/MachineLearning·

Making LLMs tell you how confident they really are through probe-targeted fine tuning.[R]

Signal
82
Hype
18
In three linesResearch on probe-targeted fine-tuning (LoRA) for verbal confidence calibration in LLMs. Models internally detect correct answers (0.76–0.88 AUROC) but output 99% confidence uniformly. Fine-tuning across 8 models (7B–70B) with causal activation patching (ρ=0.976). Code and pre-registration available.
Read source
Your take?
Fine-tuningReasoningAlignmentEvalsPapers

Summary generated by Claude — human-verified