AI RESEARCH
Disentanglement-Based Equivariant Learning for Compositional VQA
arXiv CS.LG
•
ArXi:2606.02168v1 Announce Type: cross Compositional visual question answering (VQA) represents a challenging yet fundamental task that requires models to comprehend novel combinations of previously learned concepts. The current methods often overlook the disentanglement of underlying concepts and are restricted in terms of their ability to effectively capture the compositional variation mechanism. Moreover, the state-of-the-art techniques depend on additional clues for