Back to feed
arXiv cs.LG·

UCCI: Calibrated Uncertainty for Cost-Optimal LLM Cascade Routing

Signal
82
Hype
15
In three linesUCCI is an LLM cascade router using uncertainty calibration to reduce inference costs. Via isotonic regression, it maps token-level margin uncertainty to per-query error probability, then selects escalation threshold via cost minimization. On 75,000 NER queries with 4B/12B models, UCCI cuts costs by 31% while reducing calibration error from 0.12 to 0.03.
Read source
Your take?
AI AgentsEvalsInfrastructureReasoning

Summary generated by Claude — human-verified