Physics-Informed Video Generation via Mixture-of-Experts Latent Alignment

ArXi:2606.04737v1 Announce Type: new Large-scale video generation models have made remarkable progress in semantic consistency and visual quality, producing videos that are increasingly coherent and visually convincing. Nevertheless, the dynamics induced by pixel-level fitting do not naturally accommodate the regularities that govern real-world motion and interaction, resulting in persistent shortcomings in physical plausibility.