arXiv cs.AI·2 June 2026

SDR: Set-Distance Rewards for Radiology Report Generation

Signal

Hype

In three linesNew set-distance reward method for reinforcement learning of vision-language models on chest X-ray report generation. Tested on Qwen3-VL, Gemma3 with GRPO: 6.80% (BERTScore), 7.82% (RadGraph F1), 4.45% (CheXbert F1) improvements over supervised fine-tuning. Enables test-time best-of-N selection and mid-generation pruning reducing tokens by 50%.

Read source

Your take?

Reinforcement learning Vision Code generation Evals Papers

Summary generated by Claude — human-verified

SDR: Set-Distance Rewards for Radiology Report Generation

Other angles on this story