AI RESEARCH

Representation-Conditioned Diffusion Models for Guided Training Data Generation

arXiv CS.LG

ArXi:2605.27495v1 Announce Type: cross Data availability remains a critical bottleneck in many deep learning applications. Large-scale datasets are often expensive to collect, curate and annotate, which can limit the scalability and applicability of supervised learning methods. In this work, we evaluate the classification performance of models trained on synthetic image datasets produced by generative deep learning. In particular, we use latent diffusion models conditioned on learned representations from DINOv2, DINOv3, and.