Back to feed
arXiv cs.LG·

Anytime Training with Schedule-Free Spectral Optimization

Signal
78
Hype
18
In three linesSF-NorMuon, a schedule-free spectral optimizer, matches or exceeds tuned AdamW on 125M and 772M parameter language models without requiring a predefined learning-rate schedule. Theoretical proof of stationarity guarantee and identification of weight decay as essential for long-horizon stability.
Read source
Your take?
Reinforcement learningBenchmarksPapers

Summary generated by Claude — human-verified