CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp
Signal
75
Hype
15
In three linesCUDA implementation of Fast Walsh-Hadamard Transform (FWHT) for llama.cpp optimizing KV-cache quantization. 1-2% speedup on prefill, 7-9% on token generation with RTX 5090 and q8_0 quantization.Read source
Your take?
Summary generated by Claude — human-verified