Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers
Signal
72
Hype
15
In three linesTheoretical paper proposing optimizers respecting symmetries of modern neural architectures. Introduces equivariant update rules for embeddings, LM heads, SwiGLU MLPs, and MoE routers. Validation on dense and sparse MoE models (Qwen3, Gemma 3, OLMoE, gpt-oss) shows improved validation loss vs AdamW.Read source
Your take?
Summary generated by Claude — human-verified