Positional Encodings Anchor Spatial Structure in Vision Transformers: A Geometric Perspective on Robustness

ArXi:2606.00124v1 Announce Type: cross Positional embeddings (PEs) in Vision Transformers (ViTs) are known to impact performance and robustness, but their role in shaping internal spatial representations is not well understood. In this work, we study how different forms of PEs influence the representational geometry of ViTs and how these changes relate to robustness under content-disrupting distribution shifts. We