SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

ArXi:2605.31433v1 Announce Type: new Self-play can train language models without external supervision. However, existing methods require rule-checkable answers, leaving open-ended tasks dependent on curated prompts or frontier-model judges. We