Quality-constrained Entropy Maximization Policy Optimization for LLM Diversity

ArXi:2602.15894v2 Announce Type: replace-cross In many large language model (LLM) alignment applications, users expect not only high-quality outputs but also substantial diversity. However, existing methods often face a fundamental trade-off between these objectives: approaches that improve output quality tend to reduce diversity, while methods that increase diversity often do so at the expense of quality.