Back to feed
arXiv cs.LG·

Interdomain Attention: Beyond Token-Level Key-Value Memory

Signal
78
Hype
15
In three linesInterdomain Attention merges transformers and state space models via kernel methods: attention features are projected onto basis functions maintained by an SSM, enabling query-conditioned attention over fixed-size state. On FineWeb-Edu (125M–1.3B), outperforms softmax baselines at 1.3B on validation perplexity and commonsense tasks, with length-flat behavior up to 3.5× training context.
Read source
Your take?
ReasoningBenchmarksPapers

Summary generated by Claude — human-verified