Safety Alignment of LMs via Non-cooperative Games

ArXi:2512.20806v3 Announce Type: replace Ensuring the safety of language models (LMs) while maintaining their usefulness remains a critical challenge in AI alignment. Current approaches rely on sequential adversarial