IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference
Signal
82
Hype
15
In three linesIntentKV is a KV cache pruning technique for multi-turn LLM agents. It maintains cross-turn intent memory and uses memory-attention rules to score historical tokens. On Qwen2.5-14B with 8k budget, it reduces peak request tokens from 92.3k to 20.5k (−77.8%) and KV reads from 411M to 31M (−92.6%) with minimal accuracy loss.Read source
Your take?
Summary generated by Claude — human-verified