Memory Architectures for Multi-Turn Text-to-SQL: A Benchmark and Empirical Study
Signal
82
Hype
15
In three linesEnterpriseMem-Bench, a multi-turn Text-to-SQL benchmark with 1,400 turns across 300 sessions, evaluates GPT-5 mini, GPT-5.2, Claude Sonnet 4.5/4.6, and Opus 4.6. Key findings: without memory, accuracy collapses by Turn 3; working memory dominates complex architectures; Sonnet 4.6 regresses 17-33pp on SEC EDGAR vs Sonnet 4.5.Read source
Your take?
Summary generated by Claude — human-verified