EMA-Gated Temporal Sequence Compression in Vision Transformers [P]
Signal
72
Hype
35
In three linesNeuroFlow is a dynamic routing framework for Vision Transformer video inference. It exploits temporal redundancy via Exponential Moving Average (EMA) of patch-level embeddings to eliminate stationary tokens. Architecture B achieves 55.80× wall-clock speedup (678 ms → 11.9 ms on SigLIP 1792p) at 97.37% embedding fidelity. Code released.Read source
Your take?
Summary generated by Claude — human-verified