Back to feed
Reddit r/LocalLLaMA·

DeepSWE benchmarks indicate that DeepSeek v4 Pro only passes 8% of tasks

Signal
35
Hype
45
In three linesReddit user reports DeepSeek v4 Pro achieves 8% pass rate on DeepSWE benchmark, contrasting with their perception of near-parity with Claude Sonnet 4.6 in practice. Link to DeepSWE benchmark provided.
Read source
Your take?
DeepSeekBenchmarksCode generation

Summary generated by Claude — human-verified