Back to feed
Reddit r/MachineLearning·

EMA-Gated Temporal Sequence Compression in Vision Transformers [P]

Signal
72
Hype
35
In three linesNeuroFlow is a dynamic routing framework for Vision Transformer video inference. It exploits temporal redundancy via Exponential Moving Average (EMA) of patch-level embeddings to eliminate stationary tokens. Architecture B achieves 55.80× wall-clock speedup (678 ms → 11.9 ms on SigLIP 1792p) at 97.37% embedding fidelity. Code released.
Read source
Your take?
VisionPapersOpen sourceInfrastructure

Summary generated by Claude — human-verified