Back to feed
arXiv cs.AI·

VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation

Signal
72
Hype
25
In three linesVISAFF is a framework for Emotion Recognition in Conversation (ERC) using vision-language models. It combines two stages: speaker-centered affective grounding and reliability-guided affective complementation. The tuning-free approach leverages frozen VLMs' reasoning capabilities, integrating visual, textual, and acoustic signals to improve accuracy without expensive fine-tuning.
Read source
Your take?
VisionMulti-agentPapersEvals

Summary generated by Claude — human-verified