Representation Forcing for Bottleneck-Free Unified Multimodal Models

ArXi:2605.31604v1 Announce Type: new Unified multimodal models (UMMs) aim to handle perception and generation in a single model. Yet existing UMMs still rely on a frozen, separately pretrained VAE for image generation, imposing a structural bottleneck. Naively removing it