Back to feed
Reddit r/LocalLLaMA·

Qwen 3.6 27B MTP - Adding spec-type and spec-draft-n-max is dropping tps and reducing GPU utilization

Signal
45
Hype
15
In three linesUser reports performance degradation with Qwen 3.6 27B: enabling spec-type draft-mtp and spec-draft-n-max reduces throughput from 70 t/s to 30 t/s and GPU power from 475W to 300W, despite >50% acceptance rate. Issue appeared after recent llama.cpp update.
Read source
Your take?
QwenOpen sourceCode generationInfrastructure

Summary generated by Claude — human-verified