Reddit r/LocalLLaMA·23 May 2026

Qwen3.6 35B-A3B MTP hits 249 t/s on a 24GB consumer GPU (RTX 5090M) — 3.4× the dense 27B variant on the same image

Signal

Hype

In three linesQwen3.6 35B-A3B MTP reaches 249 t/s on RTX 5090M (24GB), 3.4× faster than dense 27B variant. MoE architecture (128 experts, ~3B active params per token) combined with MTP (86.6% draft acceptance) explains the speedup. Context scaling up to 262K tokens with minimal degradation.

Read source

Your take?

Qwen Code generation Benchmarks Open source Infrastructure

Summary generated by Claude — human-verified

Qwen3.6 35B-A3B MTP hits 249 t/s on a 24GB consumer GPU (RTX 5090M) — 3.4× the dense 27B variant on the same image

Other angles on this story