Back to feed
Reddit r/LocalLLaMA·

Gemma 4 QAT + MTP: max 33% speed increase in token generation, any ideas?

Signal
35
Hype
15
In three linesUser with 2x RTX 3060 Ti tests Gemma 4 QAT with MTP assistant model on llama.cpp. Achieves 100 t/s (33% speedup) with 80%+ draft acceptance rate, seeks tuning to exceed this threshold.
Read source
Your take?
LlamaCode generationOpen sourceInfrastructure

Summary generated by Claude — human-verified