Reddit r/LocalLLaMA·10 June 2026

Qwen3.6-MTP-27B on Tesla V100 @ 55 TPS (llama.cpp) — Any way to push this higher without quality loss?

Signal

Hype

In three linesUser runs Qwen3.6-MTP-27B-Q4_K_M on Tesla V100 with llama.cpp achieving 55 TPS (tokens/sec). Seeks throughput optimization without quality loss via configuration tuning (parallel, spec-draft-n-max, KV cache quantization). Questions whether 262144 context size impacts performance.

Read source

Your take?

Qwen Code generation Benchmarks Infrastructure

Summary generated by Claude — human-verified

Qwen3.6-MTP-27B on Tesla V100 @ 55 TPS (llama.cpp) — Any way to push this higher without quality loss?

Other angles on this story