Back to feed
Reddit r/LocalLLaMA·

GraphKV, kv cache optimization based on graph embedding models

Signal
45
Hype
25
In three linesGraphKV, KV cache compression project using graph embedding models. Achieves 7.76x compression on GPT-2 (cosine 0.999949), 3.36x on Qwen2.5-7B 32k tokens (cosine 0.990316). Inspired by TurboQuant, uses int2/int4/NF4 quantization.
Read source
Your take?
QwenCode generationOpen source

Summary generated by Claude — human-verified