PSA: You may not need to quantize spec draft when using MTP
Signal
65
Hype
15
In three linesQuantizing spec draft in llama.cpp with MTP reduces available context size. With q4_0, context drops from 91648 tokens (default fp16) to 83200 tokens. Developer am17an confirmed this counterintuitive behavior.Read source
Your take?
Summary generated by Claude — human-verified