arXiv cs.AI·19 May 2026

Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents

Signal

Hype

In three linesLarge-scale study of 64,380 SWE-bench runs across 126 agent configurations (43 frameworks × LLMs). Behavioral rules derived from single frameworks do not transfer: the same signal (e.g., error rate) correlates positively with issue resolution in 47 configs and negatively in 48. Framework identity explains 64% of variance vs. 10% for LLM family.

Read source

Your take?

AI Agents Benchmarks Code generation Evals

Summary generated by Claude — human-verified

Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents

Other angles on this story