arXiv cs.CL·27 May 2026

NestedKV: Nested Memory Routing for Long-Context KV Cache Compression

Signal

Hype

In three linesNestedKV compresses KV cache for long-context models without training. The method maintains multi-scale key anchors (global, block-level, sliding-window), scores tokens by multi-time-scale cosine anomaly, and combines rankings with head-adaptive mixing and surprise-gated routing. Improvements up to 19.10 points on RULER and 19.29 on LongBench vs KeyDiff (Qwen3-4B, r=0.75).

Read source

Your take?

Reasoning Benchmarks Qwen Llama Infrastructure

Summary generated by Claude — human-verified

NestedKV: Nested Memory Routing for Long-Context KV Cache Compression

Other angles on this story