Back to feed
arXiv cs.CL·

KVDrive: A Holistic Multi-Tier KV Cache Management System for Long-Context LLM Inference

Signal
78
Hype
15
In three linesKVDrive is a multi-tier KV cache management system for long-context LLM inference, orchestrating cache placement across GPU/DRAM/SSD, pipeline scheduling, and cross-tier coordination. The prototype achieves 1.74x higher throughput than state-of-the-art systems while preserving accuracy.
Read source
Your take?
InfrastructureReasoning

Summary generated by Claude — human-verified