Faster or Stronger: Towards Flexible Visual Place Recognition via Weighted Aggregation and Token Pruning

ArXi:2605.20551v1 Announce Type: new Visual Place Recognition (VPR) aims to match a query image to reference images of the same place in a large-scale database. Recent state-of-the-art methods employ Vision Transformers (ViTs) as backbone foundation models to extract patch-level features that are robust to viewpoint, illumination, and seasonal variations, which are then aggregated into a compact global descriptor for retrieval.