VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization

ArXi:2511.11896v3 Announce Type: replace-cross Large language models (LLMs) have recently shown strong potential in vulnerability detection (VD). However, accurately detecting vulnerabilities in real-world repositories requires reasoning over complex contextual interactions. Existing LLM-based VD approaches remain limited because current datasets lack complete contextual information and high-quality reasoning supervision, while existing optimization methods primarily rely on coarse outcome-centric supervision signals that fail to model the vulnerability reasoning process.