Back to feed
Reddit r/LocalLLaMA·

MTP has no impact on my Qwen3.6 MoE performance

Signal
35
Hype
15
In three linesUser reports MTP (Multi-Token Prediction) provides no performance improvement for Qwen3.6-35B GGUF on RTX 5060Ti: ~60 tok/s in both cases. Tests with unsloth flags (spec-type draft-mtp, spec-draft-n-max 2) but observes no speedup despite reducing ctx-size and quantization.
Read source
Your take?
QwenOpen sourceTools

Summary generated by Claude — human-verified