Back to feed
arXiv cs.AI·

Fourier Compressor: Frequency-Domain Visual Token Compression for Vision-Language Models

Signal
78
Hype
25
In three linesFourier Compressor compresses visual tokens in Vision-Language Models using Fourier transforms. The parameter-free method reduces FLOPs by 83.8% and boosts inference speed by 31.2% while retaining 96% of original accuracy. Tested on LLaVA and Qwen-VL, it generalizes to video understanding tasks.
Read source
Your take?
VisionBenchmarksInfrastructure

Summary generated by Claude — human-verified