Back to feed
arXiv cs.CL·

When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation

Signal
72
Hype
15
In three linesStudy on generating long-form literary reviews based on Torrance Test of Creative Writing (TTCW). Dataset of 263,911 stories annotated across 14 creativity dimensions. Fine-tuning Qwen3 (4B and 8B) shows non-reasoning supervision achieves better performance (0.6820), while reasoning-supervised models fail to complete the required 14-metric review format.
Read source
Your take?
QwenFine-tuningReasoningEvalsPapers

Summary generated by Claude — human-verified