Back to feed
arXiv cs.AI·

Attention Sinks and Outliers in Attention Residuals

Signal
72
Hype
18
In three linesOASIS, an inter-layer null signaling technique, reduces attention sinks and activation outliers in AttnResidual architectures. Across three datasets, OASIS decreases maximum infinity norm by 9.26%, average kurtosis by 2.60%, and improves post-quantization performance (W8A8: -75.85% perplexity, W4A4: +12.42% GSM8K).
Read source
Your take?
ReasoningPapersBenchmarks

Summary generated by Claude — human-verified