arXiv cs.CL·26 May 2026

Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes

Signal

Hype

In three linesNew arXiv paper on interpretable detection of harmful Chinese memes. Authors create Ex-ToxiCN-MM, first explanation dataset with opposing interpretations (harmful/non-harmful), and C-HarmKB, Chinese cultural knowledge base. They propose RIKE, attribution analysis framework with AKE and RIR modules, outperforming baselines. Code and data open-sourced.

Read source

Your take?

Vision AI safety Evals Open source

Summary generated by Claude — human-verified

Distinguishing Right from Wrong in Debates: Attribution Analysis of Chinese Harmful Memes

Other angles on this story