AI RESEARCH
When Mean CE Fails: Median CE Can Better Track Language Model Quality
arXiv CS.AI
•
ArXi:2605.24667v1 Announce Type: new Mean cross-entropy is the standard validation metric for language models, but it can fail to track model quality during