Reddit r/LocalLLaMA·11 June 2026

advice for dual-gpu asymmetric

Signal

Hype

In three linesUser with RTX 3080 Ti 12GB + RTX 3080 20GB optimizing asymmetric dual-GPU inference. Gemma 4 31B Q4_K_XL reaches 20t/s with standard cache, 70t/s when compressing K/V cache to q4_0. Seeks clarification on GGUF memory expansion and dual-GPU configuration advice.

Read source

Your take?

Llama Code generation Infrastructure

Summary generated by Claude — human-verified

advice for dual-gpu asymmetric

Other angles on this story