A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents

ArXi:2602.08964v2 Announce Type: replace-cross Understanding an agent's goals helps explain and predict its behaviour, yet there is no established methodology for reliably attributing goals to agentic systems. We propose a framework for evaluating goal-directedness that integrates behavioural evaluation with interpretability-based analyses of models' internal representations. As a, we examine an LLM agent navigating a 2D grid world towards a goal state.