Running Gemma4 31b-it on vLLM 0.21.0 A100s (bad quality or what am I doing wrong)
Signal
35
Hype
15
In three linesUser reports quality degradation running Gemma 4 31B-it locally on two A100s with vLLM 0.21.0 versus Google API. Same model, same parameters (tensor-parallel-size 2, max-model-len 65536, structured output), but invalid JSON outputs locally versus perfect via API.Read source
Your take?
Summary generated by Claude — human-verified