Can Entry-Wise Clipping Give Spectral Control of Stochastic Gradients?
Signal
72
Hype
15
In three linesTheoretical paper on spectral control of stochastic gradient noise via entry-wise clipping. Shows that simple entry-wise clipping balances matrix structure and computational cost, with O(ε⁻⁴) convergence guarantees under Cauchy-contaminated noise. Empirical gains: ~7% token savings on NanoGPT with smooth shrinkage, ~2% additional when combined with Muon.Read source
Your take?
Summary generated by Claude — human-verified