Back to feed
Reddit r/LocalLLaMA·

Another shout out to llama.cpp build b9455 2x3090

Signal
72
Hype
25
In three linesllama.cpp build b9455 with tensor-split achieves 70+ tokens/s on Qwen3.6-27B-UD-Q8_K_XL with 2x3090, matching vllm performance. MTP speculative decoding and flash-attention enabled. Context up to 262K tokens, prefill at 1400+ t/s.
Read source
Your take?
LlamaQwenCode generationOpen sourceInfrastructure

Summary generated by Claude — human-verified