MLReplicate: Benchmarking Autonomous Research Systems for Machine Learning Reproducibility

Autonomous research systems capable of generating complete scientific manuscripts have advanced rapidly, yet robust and realistic evaluation frameworks have failed to keep pace. To bridge this gap, we