arXiv cs.CL·21 May 2026

HRM-Text: Efficient Pretraining Beyond Scaling

Signal

Hype

In three linesHRM-Text replaces standard Transformers with a Hierarchical Recurrent Model decoupling slow strategic and fast execution layers. A 1B model trained on 40B tokens and $1,500 achieves 60.7% MMLU, 81.9% ARC-C, 82.2% DROP, 84.5% GSM8K, 56.2% MATH — 100-900x fewer tokens and 96-432x less compute than baselines.

Read source

Your take?

Papers Benchmarks Reasoning Infrastructure

Summary generated by Claude — human-verified

HRM-Text: Efficient Pretraining Beyond Scaling

Other angles on this story