Back to feed
arXiv cs.CL·

The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

Signal
78
Hype
15
In three linesTheoretical analysis of standard transformers with softmax and low precision, proving they can simulate Turing machines via Chain-of-Thought. Authors construct hardmax transformers with ternary activations, then convert to equivalent softmax without unrealistic parameter magnitudes. Results validated on Sudoku reasoning.
Read source
Your take?
ReasoningPapersBenchmarks

Summary generated by Claude — human-verified