AI RESEARCH

STREAM: A Data-Centric Framework for Mining High-Value Task-Oriented Dialogues from Streaming Media

arXiv CS.AI

ArXi:2605.25162v1 Announce Type: cross Large language models for vertical domains are bottlenecked by the scarcity of complex, domain-specific task-oriented dialogues. Existing data acquisition pipelines face a persistent trilemma: expert annotation is expensive, real-world service conversations are constrained by privacy and commercial restrictions, and static corpora quickly become temporally stale. We propose Stream, a data-centric framework that leverages publicly available streaming media (live streams and short videos) to synthesize high-value service dialogues at scale.