AI Evaluators Struggle with Models That Know When They’re Being Tested

AI researchers are starting to make progress on a confounding problem: AI models are getting better at telling when they are in an evaluation. That awareness could become a problem for AI companies that use evaluations to gauge the capabilities and behaviors of their models before releasing them. If models act differently during testing, that makes the creators likely to release models with undesirable tendencies and undermines their ability to show off test scores to potential clients.