Block-Based Double Decoders
Signal
78
Hype
15
In three linesNovel transformer architecture 'block-based double decoders' combining decoder-only training efficiency with encoder-decoder inference gains. Reduces KV-cache memory and per-token compute by at least 2/3 at inference, while maintaining full loss supervision and static sequence packing during training.Read source
Your take?
Summary generated by Claude — human-verified