Back to feed
arXiv cs.AI·

WebGameBench: Requirement-to-Application Evaluation for Coding Agents via Browser-Native Games

Signal
78
Hype
25
In three linesWebGameBench is a requirement-to-application benchmark evaluating whether coding agents can convert a web game specification into a browser-playable application. Across 111 tasks and 12 agents, the best configuration achieves 76.9% usable rate but only 20.2% excellent rate, revealing a gap between minimum delivery and full requirement satisfaction.
Read source
Your take?
AI AgentsCode generationBenchmarksEvals

Summary generated by Claude — human-verified