Back to feed
Reddit r/LocalLLaMA·

Built a config sweep CLI for llama.cpp and vLLM and found out Q4_K_M beat Q8_0 by 230ms TTFT on Qwen2.5-7B

Signal
72
Hype
28
In three linesSigilant-sweep, an open-source CLI for llama.cpp and vLLM, benchmarks 16 configurations (quantizations, KV cache, context). On Qwen2.5-7B, Q4_K_M beats Q8_0 by 230ms TTFT and +10.7 TPS. Tool measures TPS, TTFT, PPL with p50/p95 and weighted scoring (latency/quality/balanced).
Read source
Your take?
LlamaBenchmarksOpen sourceToolsInfrastructure

Summary generated by Claude — human-verified