arXiv cs.CL·8 June 2026

Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning

Signal

Hype

In three linesPolyFact, a 100K multilingual factual QA dataset grounded in Wikidata across 12 languages, evaluates three approaches to improve cross-lingual factual consistency in Qwen-2.5-7B and OLMo-2-1124-7B. GRPO outperforms supervised fine-tuning by reducing language specialization in MLP layers and attention heads, promoting shared cross-lingual representations.

Read source

Your take?

Benchmarks Reinforcement learning Qwen Open source

Summary generated by Claude — human-verified

Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning

Other angles on this story