Adaptive Scaling of Policy Constraints for Offline Reinforcement Learning

ArXi:2508.19900v2 Announce Type: replace Offline reinforcement learning (RL) enables learning effective policies from fixed datasets without any environment interaction. Existing methods typically employ policy constraints to mitigate the distribution shift encountered during offline