AI RESEARCH
An Open-Source Benchmark and Baseline for Multi-temporal Referring Segmentation
arXiv CS.AI
•
ArXi:2606.00987v1 Announce Type: cross Large Vision-Language Models (LVLMs) have shown strong visual understanding and language-guided grounding abilities, yet their capacity for multi-temporal visual reasoning remains underexplored. To bridge this gap, we