Back to feed
Reddit r/LocalLLaMA·

Whats actually happening when a model spills out of VRAM into system memory?

Signal
35
Hype
15
In three linesTechnical discussion on VRAM overflow mechanics in llama.cpp. User runs Gemma-4 26B (21GB) on RX6600XT + Ryzen 7 5700X with 32GB RAM, achieving ~20 tokens/s decode. Question: how is CPU/GPU split handled and what role do PCIe speed vs CPU play?
Read source
Your take?
LlamaCode generationInfrastructureAI Agents

Summary generated by Claude — human-verified