Back to feed
arXiv cs.AI·

Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds

Signal
78
Hype
15
In three linesComplete first-order analysis of gradient dynamics in transformer attention heads under cross-entropy training. Authors establish an advantage-based routing law and responsibility-weighted value updates, showing that optimization creates Bayesian manifolds implementing in-context probabilistic reasoning.
Read source
Your take?
ReasoningPapersBenchmarks

Summary generated by Claude — human-verified