When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?

ArXi:2605.30719v1 Announce Type: cross We study when large language models (LLMs) can serve as effective black-box policy optimizers for reinforcement learning (RL) tasks, i.e., when can we replace classical RL algorithms with an LLM? We explore this question by