AI RESEARCH
EMA-Gated Temporal Sequence Compression in Vision Transformers [P]
r/MachineLearning
•
Vision Transformers waste 90% of their compute recalculating stationary asphalt. NeuroFlow tracks semantic surprise in embedding space, physically eliminating background tokens before the encoder. Result: 55.8x wall-clock speedup for ViTs on high-res video (1792p) with 97% fidelity. No fine-tuning required. NeuroFlow is a dynamic routing framework for Vision Transformer video inference.