Back to feed
arXiv cs.CL·

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

Signal
78
Hype
15
In three linesStepwise Confidence Attribution (SCA) diagnoses multi-step reasoning failures in closed-source LLMs by assigning step-level confidence from generated traces alone. Two methods: NIBS (non-parametric) and GIBS (graph-based). On mathematical reasoning and multi-hop QA, SCA reliably identifies error-prone steps and improves self-correction success by up to 13.5%.
Read source
Your take?
ReasoningEvalsPapers

Summary generated by Claude — human-verified