AI RESEARCH

From Knowing to Doing: A Memory-Controlled Benchmark for LLM Trading Agents on Stock Markets

arXiv CS.AI

ArXi:2605.28359v1 Announce Type: new Evaluating whether large language model (LLM) agents can profit in capital markets is increasingly framed as end-to-end trading: place an agent in a historical market, let it trade, and measure portfolio returns. This setup is vulnerable to two evaluation failures. First, long backtests often overlap with the knowledge cutoffs of frontier LLMs, allowing memorized tickers, dates, prices, and market narratives to substitute for investment reasoning.