arXiv cs.AI·19 May 2026

VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation

Signal

Hype

In three linesVISAFF is a framework for Emotion Recognition in Conversation (ERC) using vision-language models. It combines two stages: speaker-centered affective grounding and reliability-guided affective complementation. The tuning-free approach leverages frozen VLMs' reasoning capabilities, integrating visual, textual, and acoustic signals to improve accuracy without expensive fine-tuning.

Read source

Your take?

Vision Multi-agent Papers Evals

Summary generated by Claude — human-verified

VISAFF: Speaker-Centered Visual Affective Feature Learning for Emotion Recognition in Conversation

Other angles on this story