AI RESEARCH
Safety Alignment of LMs via Non-cooperative Games
arXiv CS.AI
•
ArXi:2512.20806v3 Announce Type: replace Ensuring the safety of language models (LMs) while maintaining their usefulness remains a critical challenge in AI alignment. Current approaches rely on sequential adversarial