Back to feed
arXiv cs.LG·

Can Entry-Wise Clipping Give Spectral Control of Stochastic Gradients?

Signal
72
Hype
15
In three linesTheoretical paper on spectral control of stochastic gradient noise via entry-wise clipping. Shows that simple entry-wise clipping balances matrix structure and computational cost, with O(ε⁻⁴) convergence guarantees under Cauchy-contaminated noise. Empirical gains: ~7% token savings on NanoGPT with smooth shrinkage, ~2% additional when combined with Muon.
Read source
Your take?
PapersReinforcement learningBenchmarks

Summary generated by Claude — human-verified