Back to feed
arXiv cs.AI·

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

Signal
82
Hype
15
In three linesDecisionBench is a benchmark for evaluating emergent delegation in long-horizon multi-agent workflows. The substrate includes 11 models (7 vendor families), GAIA/tau-bench/BFCL tasks, and multi-axis metrics (quality, cost, latency, routing fidelity). Results show quality alone masks orchestration signals, and delivery channel dominates description content.
Read source
Your take?
AI AgentsMulti-agentBenchmarksReasoning

Summary generated by Claude — human-verified