AI RESEARCH
CSULoRA: Closest Safe Update Low-Rank Adaptation
arXiv CS.LG
•
ArXi:2605.30640v1 Announce Type: new Low-rank adaptation has become a standard method for parameter-efficient fine-tuning of large language models, but even small amounts of unsafe or adversarial fine-tuning data can substantially weaken the safety behavior of aligned models. Existing safety-preserving LoRA methods often rely on hard interventions such as projection, pruning, thresholding, or additional