DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding
Signal
75
Hype
15
In three linesDARC is a retraining-free inference-time method that reformulates response selection as distributionally robust optimization under annotator disagreement. It reranks candidates by maximizing KL-robust satisfaction objectives, with deployment controls to cap entropic risk premium without retraining.Read source
Your take?
Summary generated by Claude — human-verified