Back to feed
Reddit r/LocalLLaMA·

advice for dual-gpu asymmetric

Signal
35
Hype
15
In three linesUser with RTX 3080 Ti 12GB + RTX 3080 20GB optimizing asymmetric dual-GPU inference. Gemma 4 31B Q4_K_XL reaches 20t/s with standard cache, 70t/s when compressing K/V cache to q4_0. Seeks clarification on GGUF memory expansion and dual-GPU configuration advice.
Read source
Your take?
LlamaCode generationInfrastructure

Summary generated by Claude — human-verified