Back to feed
arXiv cs.AI·

Same Signal, Different Semantics: A Cross-Framework Behavioral Analysis of Software Engineering Agents

Signal
82
Hype
15
In three linesLarge-scale study of 64,380 SWE-bench runs across 126 agent configurations (43 frameworks × LLMs). Behavioral rules derived from single frameworks do not transfer: the same signal (e.g., error rate) correlates positively with issue resolution in 47 configs and negatively in 48. Framework identity explains 64% of variance vs. 10% for LLM family.
Read source
Your take?
AI AgentsBenchmarksCode generationEvals

Summary generated by Claude — human-verified