AI RESEARCH
Non-Uniform Noise-to-Signal Ratio in the REINFORCE Policy-Gradient Estimator
arXiv CS.LG
•
ArXi:2602.01460v3 Announce Type: replace-cross Policy-gradient methods are widely used in reinforcement learning, yet