Back to feed
arXiv cs.CL·

DecomposeRL: Learning to Ask Useful, Informative, and Diverse Questions for Semi-Supervised, Traceable Claim Verification

Signal
78
Hype
25
In three linesDecomposeRL combines accurate claim verification with inspectable traces using RL (GRPO). A 7B model trained on 5K curated claims achieves 86.3% in-domain and 69.8% out-of-domain accuracy, matching 32B baselines and GPT-4.1-mini. Works in semi-supervised settings with only 10% labeled data.
Read source
Your take?
ReasoningReinforcement learningBenchmarksEvals

Summary generated by Claude — human-verified