Reddit r/LocalLLaMA·28 May 2026

Built a config sweep CLI for llama.cpp and vLLM and found out Q4_K_M beat Q8_0 by 230ms TTFT on Qwen2.5-7B

Signal

Hype

In three linesSigilant-sweep, an open-source CLI for llama.cpp and vLLM, benchmarks 16 configurations (quantizations, KV cache, context). On Qwen2.5-7B, Q4_K_M beats Q8_0 by 230ms TTFT and +10.7 TPS. Tool measures TPS, TTFT, PPL with p50/p95 and weighted scoring (latency/quality/balanced).

Read source

Your take?

Llama Benchmarks Open source Tools Infrastructure

Summary generated by Claude — human-verified

Built a config sweep CLI for llama.cpp and vLLM and found out Q4_K_M beat Q8_0 by 230ms TTFT on Qwen2.5-7B

Other angles on this story