arXiv cs.AI·19 May 2026

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

Signal

Hype

In three linesOrthrus unifies autoregressive LLM fidelity with parallel diffusion token generation via a dual-architecture framework. A lightweight trainable module augments a frozen Transformer to enable parallel generation while maintaining exact autoregressive quality. Achieves up to 7.8x speedup with O(1) memory overhead.

Read source

Your take?

Reasoning Code generation Infrastructure

Summary generated by Claude — human-verified

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

Other angles on this story