AI RESEARCH
Expected Value Alignment for Generative Reward Modeling in Formal Mathematics Verification
arXiv CS.AI
•
ArXi:2606.01160v1 Announce Type: new Large Language Models (LLMs) are increasingly used with formal interactive theorem provers such as Lean 4. Scaling these systems with reinforcement learning or search methods requires process reward models (PRMs) that can evaluate intermediate reasoning steps. Existing reward-model designs expose a practical trade-off.