SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training
Signal
72
Hype
18
In three linesSparseOpt, a sparsity-aware optimizer, addresses gradient skew induced by Batch Normalization in Dynamic Sparse Training. Experiments on ResNet (CIFAR-100, ImageNet) show faster convergence and improved generalization. First systematic study of interactions between Batch Normalization, sparse layers, and DST.Read source
Your take?
Summary generated by Claude — human-verified