Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM
Signal
72
Hype
28
In three linesKV cache quantization benchmarks on RTX 3090 with Qwen 27B: TurboQuant overrated except TCQ (best at 2-3 bits), q5 underrated, asymmetric q4_0 beats symmetric q4_1. KLD exposes tail issues PPL hides, llama.cpp rotation matches turbo4 performance.Read source
Your take?
Summary generated by Claude — human-verified