Back to feed
arXiv cs.AI·

Training Infinitely Deep and Wide Transformers

Signal
75
Hype
15
In three linesTheoretical paper on transformer training in mean-field regime (infinite depth and width). Authors model training as controlling a neural PDE (vs ODE for ResNets), establish well-posedness of forward pass, derive explicit formulas for Wasserstein gradients, and prove gradient flow convergence to global minima under NTK injectivity conditions.
Read source
Your take?
ReasoningPapersBenchmarks

Summary generated by Claude — human-verified