Prefix-Safe Bayesian Belief Tracking for LLM Reasoning Reliability:Separating Calibration from Ranking
Signal
72
Hype
18
In three linesSequential Bayesian Belief Tracking (SBBT) method to estimate LLM reasoning trace reliability before final answers. Evaluates P(y=1|o_{1:t}) on MATH-500, GSM8K, AIME 2025, RIMO-N. Scalar scores improve calibration (Brier), while structure-aware signals gain +0.110 AUROC in hard math settings.Read source
Your take?
Summary generated by Claude — human-verified