Back to feed
Reddit r/LocalLLaMA·

How can Deepseek v4 top the coding leaderboards and still sit 8 months behind the frontier?

Signal
45
Hype
55
In three linesDeepSeek v4 Pro scores 80.6 on SWE-bench and 93.5 on LiveCodeBench but CAISI rates it 8 months behind US frontier (vs 2 months per DeepSeek). Coding benchmarks are narrow and heavily optimized; gaps emerge in cybersecurity and abstract reasoning. Quantized local versions drift further from headline scores.
Read source
Your take?
DeepSeekBenchmarksCode generationReasoningAI Agents

Summary generated by Claude — human-verified