Back to feed
arXiv cs.AI·

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

Signal
72
Hype
15
In three linesTheoretical paper proposing optimizers respecting symmetries of modern neural architectures. Introduces equivariant update rules for embeddings, LM heads, SwiGLU MLPs, and MoE routers. Validation on dense and sparse MoE models (Qwen3, Gemma 3, OLMoE, gpt-oss) shows improved validation loss vs AdamW.
Read source
Your take?
PapersReinforcement learningBenchmarksQwenGemini

Summary generated by Claude — human-verified