Back to feed
Reddit r/LocalLLaMA·

Anyone using Flash Attention 2 (ai-bond) on their V100's? How is the performance?

Signal
72
Hype
25
In three linesUser benchmarks Flash Attention 2 (ai-bond) on V100. Results show 7-24x speedup in backward pass, memory reduction up to 91.9% (323.4 MB saved). Thinking time before answering minimized. Numerical validation passes on causal and non-causal configurations.
Read source
Your take?
InfrastructureBenchmarksOpen source

Summary generated by Claude — human-verified