AI RESEARCH
Feature Alignment Determines Fusion Strategy: A Comparative Study of Cross-Attention and Concatenation in Multimodal Learning
arXiv CS.LG
•
ArXi:2606.01207v1 Announce Type: cross The choice between cross-attention and concatenation for multimodal fusion remains governed by practitioner intuition rather than principled understanding. In this paper, we nstrate that feature alignment quality, not data scale alone, is the primary determinant of which fusion strategy excels.