Back to feed
arXiv cs.LG·

A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization

Signal
72
Hype
18
In three linesStudy of delayed generalization (grokking) during language model pre-training. Using an exposure-based framework and BLiMP minimal pairs, authors observe delayed generalization across five grammatical phenomena. Grammatical concept vectors become more predictive after generalization and occupy higher-dimensional subspaces.
Read source
Your take?
PapersReasoningEvals

Summary generated by Claude — human-verified