Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs
Signal
72
Hype
18
In three linesFre-Res introduces adaptive video-token compression for video MLLMs. The framework separates spatial details (high-fidelity anchors) from temporal evolution (residual-frequency tokens via 1D-DCT). A Spatial-Guided Absorber aligns frequency dynamics with visual embeddings. Results: near full-token performance with substantial reduction in token length across short and long-video benchmarks.Read source
Your take?
Summary generated by Claude — human-verified