AI RESEARCH
Recovering Diversity Without Losing Alignment: A DPO Recipe for Post-Trained LLMs
arXiv CS.CL
•
ArXi:2605.30021v2 Announce Type: replace Many open-ended instructions have multiple valid answers that users can benefit from seeing, but post-