AI RESEARCH
Risk-Controlled Lean-as-Judge for Natural-Language Mathematical Reasoning
arXiv CS.AI
•
ArXi:2605.28365v1 Announce Type: new Lean is increasingly used to judge natural-language mathematical answers, but its signal is partial: many answers never formalize, and a failed proof may reflect an ill-typed statement or a missing library fact, not a wrong answer.