Unlocking Feature Learning in Gated Delta Networks at Scale
Signal
72
Hype
15
In three linesStudy of scaling rules for Gated Delta Networks via μP. Authors derive optimal parametrizations for learning-rate transfer across model widths. Experimental validation on LLM pre-training with AdamW and SGD.Read source
Your take?
Summary generated by Claude — human-verified