Back to feed
Reddit r/LocalLLaMA·

Experiment : MTP models just as t/s efficient as non MTP models?

Signal
62
Hype
25
In three linesBenchmark on 9070XT GPU: Qwen 35B A3B MTP achieves 43.74 T/s vs 38.07 T/s standard mode. MTP shows ~15% throughput gain despite multi-token prediction overhead. Identical test conditions (prompt, 8192 context, Q4_K_XL quantization).
Read source
Your take?
QwenBenchmarksCode generationOpen source

Summary generated by Claude — human-verified