Survival Reinforcement Learning: Toward Scalable Self-Supervised RL

ArXi:2605.31273v1 Announce Type: new While self-supervised Contrastive Reinforcement Learning (CRL) has shown remarkable depth-scaling capabilities, successfully using networks over 64 layers, scaled CRL still struggles with long-horizon goal-conditioned planning due to the uniformity-tolerance dilemma inherent in contrastive losses. We