Exploring Autonomous Agentic Data Engineering for Model Specialization

ArXi:2605.30407v1 Announce Type: cross Large Language Models (LLMs) have nstrated strong performance on general tasks, while often struggling to adapt to specialized domains without high-quality domain-specific data. Existing LLM-based data curation methods primarily rely on human-designed workflows, leaving it unexamined whether LLMs can autonomously execute an end-to-end data engineering pipeline for model specialization.