Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds
Signal
78
Hype
15
In three linesComplete first-order analysis of gradient dynamics in transformer attention heads under cross-entropy training. Authors establish an advantage-based routing law and responsibility-weighted value updates, showing that optimization creates Bayesian manifolds implementing in-context probabilistic reasoning.Read source
Your take?
Summary generated by Claude — human-verified