Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach

ArXi:2511.04393v2 Announce Type: replace Large language models (LLMs) are increasingly deployed as "agents" for decision-making (DM) in interactive and dynamic environments. Yet, since they were not originally designed for DM, recent studies show that LLMs can struggle even in basic online DM problems, failing to achieve low regret or an effective exploration-exploitation tradeoff. To address this, we