Back to feed
arXiv cs.CL·

TrustLDM: Benchmarking Trustworthiness in Language Diffusion Models

Signal
75
Hype
15
In three linesTrustLDM is a trustworthiness benchmark for Language Diffusion Models (LDMs) covering safety, privacy, and fairness. Results show LDMs degrade alignment when malicious post contexts are attached to masked responses, regardless of context length. An automatic evaluation framework (TrustLDM-Auto) systematically identifies vulnerable configurations across all tested models.
Read source
Your take?
BenchmarksAI safetyAlignmentEvals

Summary generated by Claude — human-verified