Decision-Focused On-Policy Learning for Contextual Linear Optimization with Partial Feedback

ArXi:2606.01081v1 Announce Type: new Decision-focused learning (DFL) trains predictive models by optimizing downstream decision quality rather than standalone prediction accuracy. For contextual linear optimization, most existing DFL methods assume offline data and full observations of the objective cost vector. We develop an on-policy learning method for sequential contextual linear optimization under partial feedback, generalizing the standard bandit feedback setting.