I'm seeing low draft acceptance when using Qwen3.x MTP, what am I doing wrong?
Signal
35
Hype
15
In three linesUser reports low draft acceptance (40-60%) with Qwen3.5-122B and Qwen3.6-27B in speculative decoding via llama.cpp, versus ~80% expected. Detailed configuration provided with MTP draft, Q6_K_L quantization, batch 2048.Read source
Your take?
Summary generated by Claude — human-verified