Back to feed
arXiv cs.LG·

Unlocking Feature Learning in Gated Delta Networks at Scale

Signal
72
Hype
15
In three linesStudy of scaling rules for Gated Delta Networks via μP. Authors derive optimal parametrizations for learning-rate transfer across model widths. Experimental validation on LLM pre-training with AdamW and SGD.
Read source
Your take?
ReasoningBenchmarksPapers

Summary generated by Claude — human-verified