Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk

ArXi:2605.29788v1 Announce Type: new Critical sequential decisions are rarely single-timescale: a strategic decision causally shapes the context in which every subsequent tactical choice is made; standard bandit and reinforcement-learning theory does not capture this causal coupling between timescales.