The More I Tuned My Reward Function, The Worse My RL Agent Got

Towards AI
Robotics Reinforcement Learning

A practical lesson from building a drone navigation agent and why simpler rewards often win in reinforcement learning Figure 1 - One reward function, three difficulty levels, and the failure mode becomes obvious: the agent solves the easy task cleanly, still reaches the goal in medium difficulty with some inefficiency, but in hard mode it does not crash - it hesitates, loses long-horizon progress, and times out.