Back to feed
Reddit r/LocalLLaMA·

Running Gemma4 31b-it on vLLM 0.21.0 A100s (bad quality or what am I doing wrong)

Signal
35
Hype
15
In three linesUser reports quality degradation running Gemma 4 31B-it locally on two A100s with vLLM 0.21.0 versus Google API. Same model, same parameters (tensor-parallel-size 2, max-model-len 65536, structured output), but invalid JSON outputs locally versus perfect via API.
Read source
Your take?
GeminiOpen sourceInfrastructure

Summary generated by Claude — human-verified