Dissecting Embodied Abilities in Multimodal Language Models through Skill-level Evaluation and Diagnosis

ArXi:2510.08759v2 Announce Type: replace Understanding the capability bottlenecks of embodied multimodal large language models (MLLMs) is crucial for improving embodied agents. However, existing embodied benchmarks mainly focus on task-level evaluation and fail to provide actionable insights into the underlying causes of model failures. To address this limitation, we