gemma 4 e2b quality degrades after ~30-40 continuous inferences on 4gb vram?
Signal
35
Hype
15
In three linesUser reports output quality degradation of Gemma 2B after 30-40 continuous inferences on 4GB GPU (RTX 1650). Responses become shorter, JSON fields missing, sometimes empty. Restarting llama-server fixes it. Possible KV cache or memory fragmentation issue.Read source
Your take?
Summary generated by Claude — human-verified