Reddit r/MachineLearning·21 May 2026

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

Signal

Hype

In three linesRPS is a two-stage post-training method inspired by neuroplasticity: easy data with high learning rate, then hard data with 90% reduced rate. On Qwen3-8b, RPS achieves 4% on ARC-AGI 1 and 1145/1200 error-free program executions versus 2.4% and 870/1200 for EPS (equal rate).

Read source

Your take?

Qwen Fine-tuning Code generation Benchmarks

Summary generated by Claude — human-verified

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

Other angles on this story