A Contractive Feedback Semantics for Reinforcement Learning

Discounted reinforcement learning is usually presented through Bellman equations on closed Marko decision processes. This paper develops a compositional view: a one-step decision process is treated as an open stochastic component, and infinite-horizon policy evaluation is obtained by closing a contractive feedback loop. The resulting semantics assigns typed Bellman transformers to open components, interprets series and parallel wiring as composition and tensoring of transformers, and interprets.