RISE: Reliable Improvement in Self-Evolving Vision-Language Models

ArXi:2605.20914v1 Announce Type: new Vision-language models (VLMs) have achieved strong multimodal reasoning capabilities, but further improving them still relies heavily on large-scale human-constructed supervision for post-