GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation

ArXi:2605.22036v1 Announce Type: new Despite significant progress in Vision-Language Navigation (VLN), existing approaches still rely on dense RGB videos that produce excessive patch tokens and lack explicit spatial structure, resulting in substantial computational overhead and limited spatial reasoning. To address these issues, we