COP-Q: Safety-First Reinforcement Learning for Robot Control via Cholesky-Ordered Projection

ArXi:2606.04749v1 Announce Type: cross Safe robot control requires maximizing return while satisfying safety constraints. In off-policy safe reinforcement learning, reward and safety Q-values are commonly learned by separate critic ensembles, with uncertainty handled independently for each objective. This objective-wise treatment neglects inter-objective correlation and can lead to overly conservative value estimates, thereby reducing sample efficiency.