arXiv cs.LG·28 May 2026

SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training

Signal

Hype

In three linesSparseOpt, a sparsity-aware optimizer, addresses gradient skew induced by Batch Normalization in Dynamic Sparse Training. Experiments on ResNet (CIFAR-100, ImageNet) show faster convergence and improved generalization. First systematic study of interactions between Batch Normalization, sparse layers, and DST.

Read source

Your take?

Papers Benchmarks Fine-tuning

Summary generated by Claude — human-verified

SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training

Other angles on this story