arXiv cs.CL·27 May 2026

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

Signal

Hype

In three linesCroCo extends contrastive preference tuning on self-generations to 14 languages (high and low-resource). A reward model trained on English preferences produces useful within-language rankings across languages without language-specific annotation. Gains confirmed on EuroLLM-9B and Aya-3B with on-policy data.

Read source

Your take?

Reinforcement learning Papers

Summary generated by Claude — human-verified

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

Other angles on this story