Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures

ArXi:2505.24069v4 Announce Type: replace-cross Large language models (LLMs) are deployed on increasingly complex tasks that require multi-step decision-making. Understanding their algorithmic reasoning abilities is therefore crucial. However, we lack a diagnostic benchmark for evaluating these capabilities. We propose to use data structures as a principled lens: as fundamental building blocks of algorithms, they naturally probe structural reasoning - the ability to understand and manipulate relationships such as order, hierarchy, and connectivity that underpin algorithmic reasoning. We