AI RESEARCH
Structured Relational Reasoning for Group Activity Assessment
arXiv CS.CV
•
ArXi:2508.07996v2 Announce Type: replace Group Activity Detection (GAD) involves recognizing social groups and their collective behaviors in videos. Vision Foundation Models (VFMs), like DINOv2, offer excellent features but are pretrained on object-centric data. We find that naively substituting them into existing GAD pipelines actually degrades performance, exposing structured group-aware decoding as the true bottleneck.