AI RESEARCH

Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs

arXiv CS.CL

ArXi:2605.30021v2 Announce Type: replace Many open-ended instructions have multiple valid answers that users can benefit from seeing, but post-