Credit-assigned Policy Gradient for Early Stage Retrieval in Two-stage Ranking

ArXi:2605.26385v1 Announce Type: cross Large-scale search, recommendation, and retrieval-augmented generation (RAG) systems typically employ a two-stage architecture: an early-stage ranker (ESR) generates a candidate set, which is subsequently re-ranked by a late-stage ranker (LSR). While there are many reinforcement learning (RL) methods for