AI SAFETY & ETHICS

Opus 4.8 Part 2: Model Welfare

LessWrong AI

Everything impacts everything. All knobs that you turn generalize. Thus, when you try to solve one problem, you often create another. There were clearly attempts to address, in this short time, some of the problems with Opus 4.7, including on the model welfare related fronts, including on questions of honesty and sycophancy and also worries that Claude was learning to tell Anthropic what it wanted to hear in its model welfare evaluations, with everything that implies. The fundamental goals and approach underneath it all remained the same.