CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

ArXi:2605.23491v1 Announce Type: cross Recently, Reinforcement Learning with Verifiable Rewards (RLVR) and Test-Time Scaling (TTS) have advanced LLM code generation through executable verification. Yet Ground-Truth Unit Tests (GT UTs) remain a bottleneck: SOTA RLVR methods require them for costly