I'm still surprised on how good the kv quantization has become
Signal
45
Hype
25
In three linesA r/LocalLLaMA user reports that KV (key-value) quantization has reached impressive quality: even with KV at q4_0 (including the drafter), the model accurately retrieves information within a 100k token context.Read source
Your take?
Summary generated by Claude — human-verified