Claude Opus 4.8 is out. The benchmark isn't why I'm switching.

Dev.to AI
Generative AI AI Research

Anthropic shipped Claude Opus 4.8 today. The benchmark numbers went up, as they always do. But that's not why I'm switching my default model, and I want to explain the part that actually changed how I work. The numbers, quickly Here's the official comparison: The highlights: SWE-Bench Pro: 69.2% - up from 64.3% on 4.7, well ahead of GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%). Computer use (OSWorld-Verified): 83.4% - still the model to beat for clicking around real UIs. Knowledge work (GDPval-AA): 1890 vs 1769 for GPT-5.5.