Back to feed
arXiv cs.CL·

DECOR: Auditing LLM Deception via Information Manipulation Theory

Signal
78
Hype
25
In three linesDECOR is a multi-agent framework for auditing deception in LLMs by decomposing contexts into atomic informational units and scoring four manipulation dimensions (omission, focus-shifting, meaning-obscuring). Tested on 15 frontier models, it achieves state-of-the-art deception detection on single and multi-turn benchmarks with interpretable manipulation profiles.
Read source
Your take?
Multi-agentAI safetyAlignmentEvalsBenchmarks

Summary generated by Claude — human-verified