AblationBench: Evaluating Automated Planning of Ablations in Empirical AI Research

ArXi:2507.08038v3 Announce Type: replace-cross Language model agents are increasingly used to automate scientific research, yet evaluating their scientific contributions remains a challenge. A key mechanism to obtain such insights is through ablation experiments. To this end, we