I just created a detailed report based on the DeepSWE benchmark data
r/singularity
•
AI Research
I wanted a bit details about how each model performed, price and performance. So I put together this report (with the help of AI) to make it easier to explore the significant findings of the data from DeepSWE. Additionally, I added my own benchmark run of Mimo V2.5 (the non-pro version), as well as tweaked the pricing to reflect the recent pricing changes. In terms of my observations, I found it interesting that many of the open weights models end up being astronomically expensive when calculated as cost per pass, and time per pass was also an interesting statistic.