AI RESEARCH

When Mean CE Fails: Median CE Can Better Track Language Model Quality

arXiv CS.AI

ArXi:2605.24667v1 Announce Type: new Mean cross-entropy is the standard validation metric for language models, but it can fail to track model quality during