Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning
Signal
82
Hype
18
In three linesPolyFact, a 100K multilingual factual QA dataset grounded in Wikidata across 12 languages, evaluates three approaches to improve cross-lingual factual consistency in Qwen-2.5-7B and OLMo-2-1124-7B. GRPO outperforms supervised fine-tuning by reducing language specialization in MLP layers and attention heads, promoting shared cross-lingual representations.Read source
Your take?
Summary generated by Claude — human-verified