Back to feed
Reddit r/LocalLLaMA·

Benchmark & Reality Check on Gemma 4 12B: Great model, but your local settings are probably breaking it (Fix inside)

Signal
65
Hype
25
In three linesBenchmark of Gemma 4 12B on Python bug-hunting task. Model finds 6 bugs vs 14 for Qwen 35B. Default LM Studio settings disable reasoning. Fix: enable enable_thinking in Jinja template, set thought tokens (<|channel>thought / <channel|>), use temperature 1.0, top_p 0.95, top_k 64.
Read source
Your take?
GeminiBenchmarksReasoningCode generationTools

Summary generated by Claude — human-verified