AI RESEARCH

Auditing LLM Benchmarks with Item Response Theory

arXiv CS.CL • June 01, 2026

ArXi:2605.30504v1 Announce Type: new LLM benchmark labels are frozen at release and silently propagated into downstream benchmarks, errors and all. We

Read Full Article