Built a config sweep CLI for llama.cpp and vLLM and found out Q4_K_M beat Q8_0 by 230ms TTFT on Qwen2.5-7B
Signal
72
Hype
28
In three linesSigilant-sweep, an open-source CLI for llama.cpp and vLLM, benchmarks 16 configurations (quantizations, KV cache, context). On Qwen2.5-7B, Q4_K_M beats Q8_0 by 230ms TTFT and +10.7 TPS. Tool measures TPS, TTFT, PPL with p50/p95 and weighted scoring (latency/quality/balanced).Read source
Your take?
Summary generated by Claude — human-verified