Reddit r/LocalLLaMA·20 May 2026

RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help

Signal

Hype

In three linesRTX 5080 16GB benchmark with Qwen3.6 35B MoE at 128k context: 56 tok/s without MTP, 74 tok/s with MTP but slower overall. MTP forces a 1.5GB buffer that offloads 3 expert layers GPU→CPU, creating a bottleneck. The 27B IQ3 reaches 73 tok/s and fits entirely on GPU.

Read source

Your take?

Qwen Benchmarks Open source Infrastructure

Summary generated by Claude — human-verified

RTX 5080 16GB: Qwen3.6 35B MoE at 128k context — 56 tok/s, and why MTP doesn't help

Other angles on this story