Back to feed
arXiv cs.AI·

One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer

Signal
72
Hype
18
In three linesStudy of a shared-weight recurrent Transformer architecture (AIR) that develops two distinct roles without modular partitioning. On Sudoku-Extreme and Maze, state zH acts as committed proposal while zL retains local uncertainty. Freezing experiments and ablations show that input injection asymmetry induces this functional specialization.
Read source
Your take?
ReasoningPapers

Summary generated by Claude — human-verified