ORLoopBench: Solver-in-the-Loop Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

ArXi:2601.21008v3 Announce Type: replace-cross Operations Research practitioners debug infeasible models through an iterative process: inspecting Irreducible Infeasible Subsystems ( IIS), identifying constraint conflicts, and repairing formulations until feasibility is red. Existing LLM benchmarks mostly treat OR as one-shot translation from problem descriptions to solver code, omitting this diagnostic loop.