PinPoint: Prompting with Informative Interior Points

ArXi:2605.26689v1 Announce Type: cross Modern referring image segmentation pipelines couple a vision-language model (VLM) for grounding with a promptable segmenter such as the Segment Anything Model (SAM) for mask generation. Prior