EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction

ArXi:2605.28101v1 Announce Type: cross Predicting spatially varying Room Impulse Response (RIR) from sparse observations is a critical but highly challenging inverse problem for immersive spatial audio rendering. In this work, we present EIGENET, a geometry-informed multi-modal framework for few-shot novel view RIR prediction. At its core is a Cross-view Alternate-attention Transformer that iteratively refines local intra-view acoustic structures and global cross-view spatial relationships.