Reinforcement Learning: The Post-Training Engine Behind Reasoning Models

Reinforcement learning used to feel like a branch of AI reserved for games, robotics, recommendation systems, and control. It was the world of agents, environments, rewards, policies, simulators, self-play, exploration, and long-horizon decisions. The defining question was simple to state but difficult to solve: How should an agent act in an environment to maximise future reward? Then, large language models changed the centre of gravity. The early LLM era was dominated by self-supervised pre