DeepSWE benchmarks indicate that DeepSeek v4 Pro only passes 8% of tasks

Is this accurate? I use DS v4 in OpenCode and find it nearly on par with Sonnet 4.6, so I'm surprised the score is so low. submitted by /u/Federal_Spend2412 [link] [comments]