arXiv cs.CL·20 May 2026

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

Signal

Hype

In three linesStepwise Confidence Attribution (SCA) diagnoses multi-step reasoning failures in closed-source LLMs by assigning step-level confidence from generated traces alone. Two methods: NIBS (non-parametric) and GIBS (graph-based). On mathematical reasoning and multi-hop QA, SCA reliably identifies error-prone steps and improves self-correction success by up to 13.5%.

Read source

Your take?

Reasoning Evals Papers

Summary generated by Claude — human-verified

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

Other angles on this story