arXiv cs.CL·19 May 2026

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

Signal

Hype

In three linesAgentKernelArena is an open-source benchmark for evaluating AI agents on GPU kernel optimization. It contains 196 tasks (HIP-to-HIP, Triton-to-Triton, PyTorch-to-HIP) and tests generalization on unseen configurations. Tested agents (Cursor Agent, Claude Code, Codex) achieve speedups up to 6.89x, but show generalization weaknesses on PyTorch-to-HIP.

Read source

Your take?

AI Agents Code generation Benchmarks Claude Code

Summary generated by Claude — human-verified

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

Other angles on this story