Back to feed
arXiv cs.CL·

Playing with Words, Improving with Rewards: Training Language Models for Creative Association

Signal
75
Hype
25
In three linesTraining Qwen models (1.7B, 4B, 8B) on Codenames game to improve creativity via Reinforcement Learning with Verifiable Rewards (RLVR). 8B model gains creativity (+8/10 benchmarks) with minor reasoning degradation, while smaller models prioritize precision. Study on creativity-precision trade-off across model scales.
Read source
Your take?
QwenReinforcement learningReasoningBenchmarks

Summary generated by Claude — human-verified