Back to feed
Reddit r/LocalLLaMA·

Qwen3.6 35B-A3B MTP hits 249 t/s on a 24GB consumer GPU (RTX 5090M) — 3.4× the dense 27B variant on the same image

Signal
82
Hype
15
In three linesQwen3.6 35B-A3B MTP reaches 249 t/s on RTX 5090M (24GB), 3.4× faster than dense 27B variant. MoE architecture (128 experts, ~3B active params per token) combined with MTP (86.6% draft acceptance) explains the speedup. Context scaling up to 262K tokens with minimal degradation.
Read source
Your take?
QwenCode generationBenchmarksOpen sourceInfrastructure

Summary generated by Claude — human-verified