TreeDQN: Sample-Efficient Off-Policy Reinforcement Learning for Combinatorial Optimization

ArXi:2306.05905v2 Announce Type: replace A convenient approach to optimally solving combinatorial optimization tasks is the Branch-and-Bound method. Its branching heuristic can be learned to solve a large set of similar tasks. The promising results here are achieved by the recently appeared on-policy reinforcement learning method based on the tree Marko Decision Process. To overcome its main disadvantages, namely, very large