arXiv cs.LG·26 May 2026

Interdomain Attention: Beyond Token-Level Key-Value Memory

Signal

Hype

In three linesInterdomain Attention merges transformers and state space models via kernel methods: attention features are projected onto basis functions maintained by an SSM, enabling query-conditioned attention over fixed-size state. On FineWeb-Edu (125M–1.3B), outperforms softmax baselines at 1.3B on validation perplexity and commonsense tasks, with length-flat behavior up to 3.5× training context.

Read source

Your take?

Reasoning Benchmarks Papers

Summary generated by Claude — human-verified

Interdomain Attention: Beyond Token-Level Key-Value Memory

Other angles on this story