Reddit r/LocalLLaMA·12 June 2026

Comparing dual-GPU inference speed between llama.cpp row/tensor split and ik_llama graph split

Signal

Hype

In three linesDual-GPU benchmark (2× RTX 3080 20GB) comparing llama.cpp (row/tensor split) vs ik_llama (graph split) on Qwen3.6-27B-Q8_0. Row split: 1732 t/s prompt, 23 t/s generation, VRAM 18.2/18.5 GB. Tensor and graph split results incomplete in excerpt.

Read source

Your take?

Llama Benchmarks Code generation Infrastructure

Summary generated by Claude — human-verified

Comparing dual-GPU inference speed between llama.cpp row/tensor split and ik_llama graph split

Other angles on this story