arXiv cs.LG·10 June 2026

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

Signal

Hype

In three linesIntentKV is a KV cache pruning technique for multi-turn LLM agents. It maintains cross-turn intent memory and uses memory-attention rules to score historical tokens. On Qwen2.5-14B with 8k budget, it reduces peak request tokens from 92.3k to 20.5k (−77.8%) and KV reads from 411M to 31M (−92.6%) with minimal accuracy loss.

Read source

Your take?

AI Agents Reasoning Infrastructure Benchmarks

Summary generated by Claude — human-verified

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

Other angles on this story