PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers

ArXi:2605.26730v1 Announce Type: new The rapid growth in submissions to machine learning venues has strained the scientific peer-review system and intensified interest in LLM-based automated peer reviewers. However, how good these systems are actually, especially compared to human reviewers at catching scientific gaps, remains poorly understood. In this work, we