Back to feed
arXiv cs.CL·

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

Signal
78
Hype
25
In three linesGoLongRL presents a fully open-source post-training recipe for long-context reinforcement learning with verifiable rewards (RLVR). The authors release a 23K-sample dataset spanning 9 task types and introduce TMN-Reweight to optimize heterogeneous rewards. Qwen3-30B-A3B achieves performance comparable to DeepSeek-R1 and Qwen3-235B.
Read source
Your take?
Reinforcement learningReasoningBenchmarksOpen sourceQwen

Summary generated by Claude — human-verified