Verifying Meta-Awareness via Predictive Rewards in Reasoning Models

ArXi:2510.03259v2 Announce Type: replace-cross Recent research on reasoning models explores the meta-awareness of language models, including their ability to determine optimal thinking duration, recognize knowledge boundaries, and structure concept-level thinking. While current large reasoning models depend solely on answer-based verification, we show that adding meta-awareness objectives leads to significant performance gains over models without such meta-knowledge.