RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation

ArXi:2507.02792v5 Announce Type: replace Text-to-image (T2I) diffusion models have shown remarkable success in generating high-quality images from text prompts. Recent efforts extend these models to incorporate conditional images (e.g., canny edge) for fine-grained spatial control. Among them, feature injection methods have emerged as a