Back to feed
arXiv cs.CL·

AI Coding Agents Can Reproduce Social Science Findings

Signal
82
Hype
28
In three linesSocSci-Repro-Bench, a benchmark of 221 tasks in social sciences, evaluates AI agents' ability to reproduce published findings. Claude Code substantially outperforms Codex, with reproduction rates exceeding previous LLM-based agent benchmarks. Agents also perform strongly on reasoning tasks identifying research questions and show results are not primarily driven by memorization.
Read source
Your take?
Claude CodeBenchmarksCode generationEvalsPapers

Summary generated by Claude — human-verified