Qwen3.6-MTP-27B on Tesla V100 @ 55 TPS (llama.cpp) — Any way to push this higher without quality loss?
Signal
35
Hype
15
In three linesUser runs Qwen3.6-MTP-27B-Q4_K_M on Tesla V100 with llama.cpp achieving 55 TPS (tokens/sec). Seeks throughput optimization without quality loss via configuration tuning (parallel, spec-draft-n-max, KV cache quantization). Questions whether 262144 context size impacts performance.Read source
Your take?
Summary generated by Claude — human-verified