Back to feed
arXiv cs.CL·

Do Text Edits Generalize to Visual Generation? Benchmarking Cross-Modal Knowledge Editing in UMMs

Signal
78
Hype
25
In three linesUniKE, the first benchmark for cross-modality knowledge editing in unified multimodal models (UMMs), reveals a critical gap: text-side efficacy reaches 92% but VQA accuracy in image generation drops to 18.5%. A reasoning-augmented parameter editing method improves results by up to +18.6 percentage points.
Read source
Your take?
BenchmarksVisionFine-tuningPapers

Summary generated by Claude — human-verified