Parallax: Parameterized Local Linear Attention for Language Modeling
Signal
78
Hype
25
In three linesParallax is a parameterized Local Linear Attention mechanism for LLMs derived from statistical regression. It replaces softmax's local constant estimate with a linear estimate, yielding better bias-variance tradeoffs. Pretrained at 0.6B and 1.7B scales, Parallax shows consistent perplexity improvements and matches or outperforms FlashAttention 2/3 in decoding.Read source
Your take?
Summary generated by Claude — human-verified