GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training
Signal
78
Hype
15
In three linesGAC is an adaptive controller for hybrid SFT-RL post-training that dynamically adjusts mixing weights based on online estimates of gradient variance and disagreement between the two training signals. Tested on math, code, science, and logic benchmarks, GAC improves fixed baselines with less than 1% computational overhead.Read source
Your take?
Summary generated by Claude — human-verified