Back to feed
arXiv cs.CL·

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

Signal
72
Hype
18
In three linesCroCo extends contrastive preference tuning on self-generations to 14 languages (high and low-resource). A reward model trained on English preferences produces useful within-language rankings across languages without language-specific annotation. Gains confirmed on EuroLLM-9B and Aya-3B with on-policy data.
Read source
Your take?
Reinforcement learningPapers

Summary generated by Claude — human-verified