New DeepSWE benchmark finds Claude Opus cheats

r/LocalLLaMA
Generative AI AI Research

Sadly the open models seem far behind. submitted by /u/DeltaSqueezer [link] [comments]