Back to feed
arXiv cs.AI·

Prefix-Safe Bayesian Belief Tracking for LLM Reasoning Reliability:Separating Calibration from Ranking

Signal
72
Hype
18
In three linesSequential Bayesian Belief Tracking (SBBT) method to estimate LLM reasoning trace reliability before final answers. Evaluates P(y=1|o_{1:t}) on MATH-500, GSM8K, AIME 2025, RIMO-N. Scalar scores improve calibration (Brier), while structure-aware signals gain +0.110 AUROC in hard math settings.
Read source
Your take?
ReasoningEvalsBenchmarks

Summary generated by Claude — human-verified