arXiv cs.AI·19 May 2026

Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs

Signal

Hype

In three linesFre-Res introduces adaptive video-token compression for video MLLMs. The framework separates spatial details (high-fidelity anchors) from temporal evolution (residual-frequency tokens via 1D-DCT). A Spatial-Guided Absorber aligns frequency dynamics with visual embeddings. Results: near full-token performance with substantial reduction in token length across short and long-video benchmarks.

Read source

Your take?

Vision Video generation Evals Papers

Summary generated by Claude — human-verified

Fre-Res: Frequency-Residual Video Token Compression for Efficient Video MLLMs

Other angles on this story