Back to feed
arXiv cs.AI·

SDR: Set-Distance Rewards for Radiology Report Generation

Signal
78
Hype
15
In three linesNew set-distance reward method for reinforcement learning of vision-language models on chest X-ray report generation. Tested on Qwen3-VL, Gemma3 with GRPO: 6.80% (BERTScore), 7.82% (RadGraph F1), 4.45% (CheXbert F1) improvements over supervised fine-tuning. Enables test-time best-of-N selection and mid-generation pruning reducing tokens by 50%.
Read source
Your take?
Reinforcement learningVisionCode generationEvalsPapers

Summary generated by Claude — human-verified