Back to feed
arXiv cs.LG·

Tensor Cache: Eviction-conditioned Associative Memory for Transformers

Signal
78
Hype
15
In three linesTensor Cache introduces a two-level cache for transformers: sliding-window local attention (L1) plus fixed-size outer-product fast-weight memory (L2) storing evicted KV pairs as a matrix. A learned gate fuses outputs. Improves memory-quality tradeoff on long-context models.
Read source
Your take?
ReasoningInfrastructureBenchmarks

Summary generated by Claude — human-verified