Back to feed
arXiv cs.AI·

When and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming Loop

Signal
72
Hype
15
In three linesTheoretical study of foundation models trained on synthetic data from other model iterations. Authors show that human curation of one model can degrade alignment of other models through cross-model interactions, unlike isolated settings where it always improves alignment.
Read source
Your take?
AlignmentReinforcement learningPapers

Summary generated by Claude — human-verified