AI RESEARCH
Decision-Focused On-Policy Learning for Contextual Linear Optimization with Partial Feedback
arXiv CS.LG
•
ArXi:2606.01081v1 Announce Type: new Decision-focused learning (DFL) trains predictive models by optimizing downstream decision quality rather than standalone prediction accuracy. For contextual linear optimization, most existing DFL methods assume offline data and full observations of the objective cost vector. We develop an on-policy learning method for sequential contextual linear optimization under partial feedback, generalizing the standard bandit feedback setting.