Back to feed
arXiv cs.CL·

HRM-Text: Efficient Pretraining Beyond Scaling

Signal
78
Hype
35
In three linesHRM-Text replaces standard Transformers with a Hierarchical Recurrent Model decoupling slow strategic and fast execution layers. A 1B model trained on 40B tokens and $1,500 achieves 60.7% MMLU, 81.9% ARC-C, 82.2% DROP, 84.5% GSM8K, 56.2% MATH — 100-900x fewer tokens and 96-432x less compute than baselines.
Read source
Your take?
PapersBenchmarksReasoningInfrastructure

Summary generated by Claude — human-verified