Back to feed
Hugging Face Blog·

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

Signal
72
Hype
28
In three linesIBM and UC Berkeley introduce IT-Bench and MAST to diagnose enterprise agent failures. IT-Bench benchmarks agents on realistic IT tasks, while MAST (Multi-Agent Simulation Testbed) simulates complex environments to test multi-agent system robustness.
Read source
Your take?
AI AgentsMulti-agentBenchmarksEvals

Summary generated by Claude — human-verified