Reddit r/LocalLLaMA·20 May 2026

Do you think there is room for optimization? llama.cpp/qwen3.6 27b on two 6000 Blackwell

Signal

Hype

In three linesUser runs Qwen3.6-27B via llama.cpp on two Blackwell 6000 MaxQ GPUs with AMD Epyc, achieving 100-110 t/s. Seeks optimizations: cards at 250/300W, ~20GB VRAM free. Setup includes flash-attention, speculative decoding (draft-MTP), batch 6144, 1M context.

Read source

Your take?

Llama Open source Code generation Infrastructure

Summary generated by Claude — human-verified

Do you think there is room for optimization? llama.cpp/qwen3.6 27b on two 6000 Blackwell

Other angles on this story