Reddit r/LocalLLaMA·23 May 2026

I added native MTP to exo for Qwen3.6 MLX models; here are the exactness and speed results

Signal

Hype

In three linesContribution to exo: native multi-token prediction (MTP) support for Qwen 3.6 MLX models. Benchmarks on 27B (2x speedup at K=2/K=3) and 35B-A3B (1.16x at K=1). Exactness verified: identical token IDs to greedy path, speculative probability-ratio acceptance in sampling.

Read source

Your take?

Qwen Open source Code generation Benchmarks

Summary generated by Claude — human-verified

I added native MTP to exo for Qwen3.6 MLX models; here are the exactness and speed results

Other angles on this story