Back to feed
arXiv cs.CL·

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

Signal
78
Hype
15
In three linesAgentKernelArena is an open-source benchmark for evaluating AI agents on GPU kernel optimization. It contains 196 tasks (HIP-to-HIP, Triton-to-Triton, PyTorch-to-HIP) and tests generalization on unseen configurations. Tested agents (Cursor Agent, Claude Code, Codex) achieve speedups up to 6.89x, but show generalization weaknesses on PyTorch-to-HIP.
Read source
Your take?
AI AgentsCode generationBenchmarksClaude Code

Summary generated by Claude — human-verified