Claude Opus 4.8 is out. The benchmark isn't why I'm switching.
Dev.to AI
•
Generative AI
AI Research
Anthropic shipped Claude Opus 4.8 today. The benchmark numbers went up, as they always do. But that's not why I'm switching my default model, and I want to explain the part that actually changed how I work. The numbers, quickly Here's the official comparison: The highlights: SWE-Bench Pro: 69.2% - up from 64.3% on 4.7, well ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%). Computer use (OSWorld-Verified): 83.4% - still the model to beat for clicking around real UIs. Knowledge work (GDPval-AA): 1890 vs 1769 for GPT-5.5.