Qwen 3.6 27B MTP - Adding spec-type and spec-draft-n-max is dropping tps and reducing GPU utilization
Signal
45
Hype
15
In three linesUser reports performance degradation with Qwen 3.6 27B: enabling spec-type draft-mtp and spec-draft-n-max reduces throughput from 70 t/s to 30 t/s and GPU power from 475W to 300W, despite >50% acceptance rate. Issue appeared after recent llama.cpp update.Read source
Your take?
Summary generated by Claude — human-verified