SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More

Hi all, Sorry for going missing - we’ve been collecting a larger, higher-quality set of complex tasks. We’re excited to share a major leaderboard update covering the past three months. We’ve updated the SWE-rebench leaderboard with 110 fresh Python tasks from GitHub PRs created in March, April, and part of May. The setup follows the standard SWE-bench format: models read real PR issues, edit code, run tests, and must make the full test suite pass.