How can Deepseek v4 top the coding leaderboards and still sit 8 months behind the frontier?
Signal
45
Hype
55
In three linesDeepSeek v4 Pro scores 80.6 on SWE-bench and 93.5 on LiveCodeBench but CAISI rates it 8 months behind US frontier (vs 2 months per DeepSeek). Coding benchmarks are narrow and heavily optimized; gaps emerge in cybersecurity and abstract reasoning. Quantized local versions drift further from headline scores.Read source
Your take?
Summary generated by Claude — human-verified