arXiv cs.CL·19 May 2026

BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

Signal

Hype

In three linesBacktestBench is the first large-scale benchmark for automated quantitative backtesting, containing 18,246 annotated QA pairs across 6 million real market records. AutoBacktest, a multi-agent system, translates natural language strategies into reproducible backtests via a Summarizer, SQL Retriever, and Python Coder. Evaluation on 23 mainstream LLMs.

Read source

Your take?

Benchmarks Multi-agent Code generation AI Agents Papers

Summary generated by Claude — human-verified

BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

Other angles on this story