llama.cpp MTP support landed - Qwen3.6 27B at 2.44× on a Strix Halo, 2.17× on a RTX 3090 rig
MTP (speculative decoding) support merged into llama.cpp (PR #22673, May 16). Qwen 3.6 27B benchmarks: 1.81×–2.44× speedup on Strix Halo (ROCm), 1.54×–2.17× on RTX 3090. MoE 35B-A3B shows smaller gains (1.24×–1.40×). Enable with --spec-type draft-mtp --spec-draft-n-max N.