Anytime Training with Schedule-Free Spectral Optimization
Signal
78
Hype
18
In three linesSF-NorMuon, a schedule-free spectral optimizer, matches or exceeds tuned AdamW on 125M and 772M parameter language models without requiring a predefined learning-rate schedule. Theoretical proof of stationarity guarantee and identification of weight decay as essential for long-horizon stability.Read source
Your take?
Summary generated by Claude — human-verified