AI RESEARCH
Bayesian learning for the stochastic shortest path problem
arXiv CS.LG
•
Sequential decision-making problems are often modelled as a Marko decision process (MDP). We focus on the stochastic shortest path (SSP) problem, which is an infinite-horizon undiscounted MDP with absorbing terminal states. Specifically, we learn the optimal action-value function $Q^*$, but unlike many existing Bayesian approaches, we do not rely on unrealistic modelling as