LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?

ArXi:2605.26781v1 Announce Type: new Advanced Large Multimodal Models (LMMs) have nstrated impressive performance in K-12 reasoning tasks, exhibiting great promise as intelligent tutors. Realizing this potential requires models to navigate real-world examinations effectively, yet most existing benchmarks fail to capture the complexity of authentic testing environments. Specifically, most datasets are static, prone to data contamination, and are often confined to restricted modalities, disciplines, and evaluation criteria. To address these issues, we.