AI RESEARCH
LVDrive: Latent Visual Representation Enhanced Vision-Language-Action Autonomous Driving Model
arXiv CS.CV
•
ArXi:2605.22089v1 Announce Type: new Vision-Language-Action (VLA) models have emerged as a promising framework for end-to-end autonomous driving. However, existing VLAs typically rely on sparse action supervision, which underutilizes their powerful scene understanding and reasoning capabilities. Recent attempts to incorporate dense visual supervision via world modeling often overemphasize pixel-level image reconstruction, neglecting semantically meaningful scene representation learning.