FROST-STA: Frozen Dense Features for the Ego4D Short-Term Object Interaction Anticipation

ArXi:2606.00694v1 Announce Type: new Short-term anticipation in egocentric video requires than recognizing the current scene: a system must infer which object the camera wearer will contact, which action will follow, and how soon the contact will happen. This report describes FROST-STA, our submission to the Ego4D Short-Term Object Interaction Anticipation (STA) Challenge at EgoVis 2026. For each query time, the model produces a ranked set of structured hypotheses containing an active-object box, noun label, verb label, time-to-contact (TTC), and confidence.