Manboformer: Learning Gaussian Representations via Spatial-temporal Attention Mechanism

ArXi:2503.04863v2 Announce Type: replace-cross Compared with voxel-based grid prediction, in the field of 3D semantic occupation prediction for autonomous driving, GaussianFormer proposed using 3D Gaussian to describe scenes with sparse 3D semantic Gaussian based on objects is another scheme with lower memory requirements. Each 3D Gaussian function represents a flexible region of interest and its semantic features, which are iteratively refined by the attention mechanism.