Fine-grained Fragment Retrieval in Multi-modal Long-form Dialogues
Signal
72
Hype
18
In three linesNew FFR approach retrieves coherent multi-utterance, multi-image fragments from long-form multimodal dialogues. Two models: F2RVLM (generation + RL with multi-objective rewards) for single-dialogue, FFRS (two-stage indexing + retrieval) for corpus-scale. MLDR dataset introduced, superior performance on benchmarks.Read source
Your take?
Summary generated by Claude — human-verified