When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation
Signal
72
Hype
15
In three linesStudy on generating long-form literary reviews based on Torrance Test of Creative Writing (TTCW). Dataset of 263,911 stories annotated across 14 creativity dimensions. Fine-tuning Qwen3 (4B and 8B) shows non-reasoning supervision achieves better performance (0.6820), while reasoning-supervised models fail to complete the required 14-metric review format.Read source
Your take?
Summary generated by Claude — human-verified