Back to feed
arXiv cs.AI·

When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning

Signal
78
Hype
15
In three linesStudy across 6,000 task-condition pairs shows multi-agent debate degrades generation (-1.6 to -15.5pp) via critique-induced confusion, yet improves error detection (+27.4pp F1). Adversarial separation with code-execution grounding and evidence-gated generation achieves +5.3pp on generative tasks.
Read source
Your take?
Multi-agentAI AgentsEvalsBenchmarks

Summary generated by Claude — human-verified