AI RESEARCH

On-Policy Consistency Training Improves LLM Safety with Minimal Capability Degradation

arXiv CS.LG

ArXi:2605.21834v1 Announce Type: new Aligned models can misbehave in several ways: they are often sycophantic, fall victim to jailbreaks, or fail to include appropriate safety warnings. Consistency