Back to feed
arXiv cs.AI·

KVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric Redundancy

Signal
78
Hype
25
In three linesKVCapsule compresses KV cache in vision-language models during autoregressive decoding. The method exploits structured attention patterns in vision tokens to achieve 2x TPS improvement and 2.4x memory reduction at 60% compression ratio, without modifying the pretrained backbone.
Read source
Your take?
VisionReasoningInfrastructureBenchmarks

Summary generated by Claude — human-verified