Single 3090 with Q4 Qwen 27B, context dropped from 137k to 14k with MTP enabled. Is it normal?
Signal
35
Hype
15
In three linesUser reports dramatic context size drop (137k → 14k) when enabling MTP (Multi-Token Prediction) with speculative decoding on Qwen 27B Q4 in llama.cpp. Asks if this behavior is expected.Read source
Your take?
Summary generated by Claude — human-verified