Reddit r/LocalLLaMA·6 June 2026

Qwen 3.6 27B MTP - Adding spec-type and spec-draft-n-max is dropping tps and reducing GPU utilization

Signal

Hype

In three linesUser reports performance degradation with Qwen 3.6 27B: enabling spec-type draft-mtp and spec-draft-n-max reduces throughput from 70 t/s to 30 t/s and GPU power from 475W to 300W, despite >50% acceptance rate. Issue appeared after recent llama.cpp update.

Read source

Your take?

Qwen Open source Code generation Infrastructure

Summary generated by Claude — human-verified

Qwen 3.6 27B MTP - Adding spec-type and spec-draft-n-max is dropping tps and reducing GPU utilization

Other angles on this story