3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation

ArXi:2605.26500v1 Announce Type: new Vision-language navigation (VLN) requires an agent to traverse complex 3D environments based on natural language instructions, necessitating a thorough scene understanding. While existing works equip agents with various scene representations to enhance spatial awareness, they often neglect the complex 3D geometry and rich semantics in VLN scenarios, limiting the ability to generalize across diverse and unseen environments.