Back to feed
Reddit r/LocalLLaMA·

Yay got Gemma 12B QAT working on old 1080ti (maybe with speculative decoding?)

Signal
45
Hype
15
In three linesUser runs Gemma 12B QAT on GTX 1080 Ti (9 years old) at 50 tok/sec. Setup includes speculative decoding with MTP draft model and Q4_K_XL quantization. Seeking further optimizations.
Read source
Your take?
GeminiCode generationOpen sourceInfrastructure

Summary generated by Claude — human-verified