Self-Evolving Deep Research via Joint Generation and Evaluation

ArXi:2606.04507v1 Announce Type: cross Large Language Models (LLMs) have become increasingly adopted in daily applications, with deep research standing out as a particularly important capability. Unlike traditional question-answering (QA) tasks, deep research report generation lacks definitive ground-truth, making reward design inherently unverifiable and limiting effective reinforcement learning.