Back to feed
arXiv cs.LG·

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training

Signal
78
Hype
15
In three linesGAC is an adaptive controller for hybrid SFT-RL post-training that dynamically adjusts mixing weights based on online estimates of gradient variance and disagreement between the two training signals. Tested on math, code, science, and logic benchmarks, GAC improves fixed baselines with less than 1% computational overhead.
Read source
Your take?
Reinforcement learningFine-tuningBenchmarks

Summary generated by Claude — human-verified