Back to feed
arXiv cs.CL·

NestedKV: Nested Memory Routing for Long-Context KV Cache Compression

Signal
78
Hype
18
In three linesNestedKV compresses KV cache for long-context models without training. The method maintains multi-scale key anchors (global, block-level, sliding-window), scores tokens by multi-time-scale cosine anomaly, and combines rankings with head-adaptive mixing and surprise-gated routing. Improvements up to 19.10 points on RULER and 19.29 on LongBench vs KeyDiff (Qwen3-4B, r=0.75).
Read source
Your take?
ReasoningBenchmarksQwenLlamaInfrastructure

Summary generated by Claude — human-verified