RoadTones: Tone Controllable Text Generation from Road Event Videos

ArXi:2605.21411v1 Announce Type: new Existing video-language models can generate factual descriptions of road events but lack control over how these events are expressed: their tone, urgency, or style. This limits deployment in communication-critical settings where the effectiveness of a message depends on both content and presentation, not just factual accuracy. To mitigate this, we