Back to feed
arXiv cs.LG·

Simply Stabilizing the Loop via Fully Looped Transformer

Signal
72
Hype
18
In three linesFully Looped Transformer addresses training instability in looped models by reusing Transformer blocks. Two parameter-free modifications: inter-loop signal distribution and attention injection. Stable up to 12 iterations, improves downstream performance by 13.2%, and enables adjustable inference-time compute via loop iteration control.
Read source
Your take?
ReasoningPapersBenchmarks

Summary generated by Claude — human-verified