Back to feed
Reddit r/MachineLearning·

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

Signal
62
Hype
35
In three linesRPS is a two-stage post-training method inspired by neuroplasticity: easy data with high learning rate, then hard data with 90% reduced rate. On Qwen3-8b, RPS achieves 4% on ARC-AGI 1 and 1145/1200 error-free program executions versus 2.4% and 870/1200 for EPS (equal rate).
Read source
Your take?
QwenFine-tuningCode generationBenchmarks

Summary generated by Claude — human-verified