Back to feed
Reddit r/LocalLLaMA·

110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp

Signal
72
Hype
25
In three linesik_llama.cpp outperforms llama.cpp on RTX 4070 Super 12GB: 110 tok/s average vs 90.6 tok/s with Qwen3.6-35B-A3B-IQ4_XS. Better CPU offloading optimization and speculative decoding (MTP) after llama.cpp performance regression post-merge.
Read source
Your take?
QwenOpen sourceInfrastructureBenchmarks

Summary generated by Claude — human-verified