llama: use f16 mask for FA to save VRAM by am17an · Pull Request #23764 · ggml-org/llama.cpp
Signal
65
Hype
15
In three linesllama.cpp PR #23764: use f16 masks in Flash Attention to reduce VRAM consumption. Optimization enabling larger models to fit on GPU memory.Read source
Your take?
Summary generated by Claude — human-verified