Back to feed
Reddit r/LocalLLaMA·

Benchmarked Needle 26M vs Qwen3-0.6B on CPU function calling, 50 queries across 5 difficulty tiers. The 23x smaller model wins on accuracy and is 4.4x faster.

Signal
78
Hype
15
In three linesCPU benchmark of Needle (26M) vs Qwen3-0.6B on function calling: 50 queries across 5 difficulty tiers. Needle wins on accuracy (72% vs 56% tool_match) and latency (10.9s vs 47.9s). Needle fails on tool selection, Qwen3 on tag emission. Qwen3 dominates on multilingual queries (Hindi, French).
Read source
Your take?
QwenBenchmarksCode generationOpen source

Summary generated by Claude — human-verified