PANDO: Efficient Multimodal AI Agents via Online Skill Distillation

ArXi:2605.24785v1 Announce Type: new Recent advances in multimodal web agents often rely on increased inference-time computation, including rollout search, verifier passes, offline skill discovery, and specialist model stacks. This raises a central question: can a web agent become efficient as it accumulates experience, rather than expensive? We first analyze trajectories from VisualWebArena and identify three recurring sources of inefficiency: repeat-action loops, hidden discovery costs, and low prompt-cache reuse. We then.