When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning
Signal
78
Hype
15
In three linesStudy across 6,000 task-condition pairs shows multi-agent debate degrades generation (-1.6 to -15.5pp) via critique-induced confusion, yet improves error detection (+27.4pp F1). Adversarial separation with code-execution grounding and evidence-gated generation achieves +5.3pp on generative tasks.Read source
Your take?
Summary generated by Claude — human-verified