Gemma 4 QAT + MTP: max 33% speed increase in token generation, any ideas?
Signal
35
Hype
15
In three linesUser with 2x RTX 3060 Ti tests Gemma 4 QAT with MTP assistant model on llama.cpp. Achieves 100 t/s (33% speedup) with 80%+ draft acceptance rate, seeks tuning to exceed this threshold.Read source
Your take?
Summary generated by Claude — human-verified