arXiv cs.AI·19 May 2026

DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding

Signal

Hype

In three linesDARC is a retraining-free inference-time method that reformulates response selection as distributionally robust optimization under annotator disagreement. It reranks candidates by maximizing KL-robust satisfaction objectives, with deployment controls to cap entropic risk premium without retraining.

Read source

Your take?

Alignment Reinforcement learning Evals

Summary generated by Claude — human-verified

DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding

Other angles on this story