Back to feed
arXiv cs.LG·

ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention

Signal
78
Hype
15
In three linesThriftAttention uses mixed precision (FP16/FP4) for long-context attention on Blackwell GPUs. By selecting 5% of critical query-key blocks in FP16 and computing remaining blocks in FP4, the method recovers 89.1% of FP16 performance while maintaining FP4 efficiency. Code released.
Read source
Your take?
BenchmarksInfrastructureReasoning

Summary generated by Claude — human-verified