I added native MTP to exo for Qwen3.6 MLX models; here are the exactness and speed results
Signal
78
Hype
15
In three linesContribution to exo: native multi-token prediction (MTP) support for Qwen 3.6 MLX models. Benchmarks on 27B (2x speedup at K=2/K=3) and 35B-A3B (1.16x at K=1). Exactness verified: identical token IDs to greedy path, speculative probability-ratio acceptance in sampling.Read source
Your take?
Summary generated by Claude — human-verified