Back to feed
arXiv cs.AI·

Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models

Signal
78
Hype
25
In three linesStudy showing chess-trained language models memorize rather than generalize. KinGPT (25M params) outperforms ChessGPT (3B) and C1-4B on chess benchmarks, but analysis reveals pattern-matching. LLM-Modulo, a verifier-in-the-loop framework, improves RedPajama 3B from 1.2% to 21.2% move accuracy. Code and models open-sourced.
Read source
Your take?
BenchmarksEvalsFine-tuningPapers

Summary generated by Claude — human-verified