arXiv cs.CL·2 June 2026

TrustLDM: Benchmarking Trustworthiness in Language Diffusion Models

Signal

Hype

In three linesTrustLDM is a trustworthiness benchmark for Language Diffusion Models (LDMs) covering safety, privacy, and fairness. Results show LDMs degrade alignment when malicious post contexts are attached to masked responses, regardless of context length. An automatic evaluation framework (TrustLDM-Auto) systematically identifies vulnerable configurations across all tested models.

Read source

Your take?

Benchmarks AI safety Alignment Evals

Summary generated by Claude — human-verified

TrustLDM: Benchmarking Trustworthiness in Language Diffusion Models

Other angles on this story