Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

ArXi:2605.27567v1 Announce Type: new Causal discovery is a cornerstone of scientific reasoning, yet whether large language models can perform it reliably remains an open question. Recent benchmarks show that even fine-tuned models plateau on simple causal graphs and degrade as complexity grows, but why they fail has not been established.