Back to feed
Reddit r/LocalLLaMA·

Comparing dual-GPU inference speed between llama.cpp row/tensor split and ik_llama graph split

Signal
45
Hype
15
In three linesDual-GPU benchmark (2× RTX 3080 20GB) comparing llama.cpp (row/tensor split) vs ik_llama (graph split) on Qwen3.6-27B-Q8_0. Row split: 1732 t/s prompt, 23 t/s generation, VRAM 18.2/18.5 GB. Tensor and graph split results incomplete in excerpt.
Read source
Your take?
LlamaBenchmarksCode generationInfrastructure

Summary generated by Claude — human-verified