Back to feed
arXiv cs.LG·

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

Signal
82
Hype
15
In three linesIntentKV is a KV cache pruning technique for multi-turn LLM agents. It maintains cross-turn intent memory and uses memory-attention rules to score historical tokens. On Qwen2.5-14B with 8k budget, it reduces peak request tokens from 92.3k to 20.5k (−77.8%) and KV reads from 411M to 31M (−92.6%) with minimal accuracy loss.
Read source
Your take?
AI AgentsReasoningInfrastructureBenchmarks

Summary generated by Claude — human-verified