Back to feed
arXiv cs.AI·

DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding

Signal
75
Hype
15
In three linesDARC is a retraining-free inference-time method that reformulates response selection as distributionally robust optimization under annotator disagreement. It reranks candidates by maximizing KL-robust satisfaction objectives, with deployment controls to cap entropic risk premium without retraining.
Read source
Your take?
AlignmentReinforcement learningEvals

Summary generated by Claude — human-verified