RePercENT: Scaling Disentangled Representation Learning Beyond Two Modalities

ArXi:2606.05109v1 Announce Type: new To leverage the full potential of multimodal data, we need representations that go beyond the state-of-the-art alignment and fusion approaches and exploit all cross-modal interactions without sacrificing modality-specific information. Learning disentangled representations is a principled way to identify these underlying shared and unique factors that are hidden in observational data.