The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought
Signal
78
Hype
15
In three linesTheoretical analysis of standard transformers with softmax and low precision, proving they can simulate Turing machines via Chain-of-Thought. Authors construct hardmax transformers with ternary activations, then convert to equivalent softmax without unrealistic parameter magnitudes. Results validated on Sudoku reasoning.Read source
Your take?
Summary generated by Claude — human-verified