How do I make MTP work in llama-server?
Signal
35
Hype
15
In three linesUser tests MTP (Multi-Token Prediction) on Qwen3.6-35B with llama.cpp on RTX 3090. With MTP enabled (--spec-type draft-mtp), performance drops: prefill from 1082 t/s to 878 t/s (N=1), generation from 116 t/s to 108 t/s. Draft acceptance rates low (0.80 to 0.37). Seeks optimization advice.Read source
Your take?
Summary generated by Claude — human-verified