CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations
Signal
72
Hype
18
In three linesCroCo extends contrastive preference tuning on self-generations to 14 languages (high and low-resource). A reward model trained on English preferences produces useful within-language rankings across languages without language-specific annotation. Gains confirmed on EuroLLM-9B and Aya-3B with on-policy data.Read source
Your take?
Summary generated by Claude — human-verified