AI RESEARCH
BarrierSteer: LLM Safety via Learning Barrier Steering
arXiv CS.AI
•
ArXi:2602.20102v2 Announce Type: replace-cross Despite the strong performance of large language models (LLMs) across diverse tasks, their susceptibility to adversarial attacks and unsafe content generation remains a significant obstacle to deployment, particularly in high-stakes settings. Addressing this challenge requires safety mechanisms that are both practically effective and theoretically grounded. In this paper, we