AI RESEARCH

Decomposing and Measuring Evaluation Awareness

arXiv CS.AI

ArXi:2605.23055v1 Announce Type: cross Frontier language models sometimes recognize that they are being evaluated and adjust their behavior, undermining validity of benchmark results. Yet the field studies it without a shared foundation, conflating properties of the evaluation with properties of the model, and detection with behavioral response. We ground evaluation awareness in social psychology, decomposing it into an environment component (how recognizable the task is) and a model component that separates recognition from propensity to act on it.