KVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric Redundancy
Signal
78
Hype
25
In three linesKVCapsule compresses KV cache in vision-language models during autoregressive decoding. The method exploits structured attention patterns in vision tokens to achieve 2x TPS improvement and 2.4x memory reduction at 60% compression ratio, without modifying the pretrained backbone.Read source
Your take?
Summary generated by Claude — human-verified