DISA: Offline Importance Sampling for Distribution-Matching LLM-RL
Signal
78
Hype
15
In three linesDISA is an offline RL method for LLMs that decouples partition-function estimation (via importance sampling) from policy optimization. On 9 benchmarks (math and code), it matches or exceeds FlowRL, outperforms GRPO/GSPO, and retains substantially more strategy-level diversity than reward-maximization baselines.Read source
Your take?
Summary generated by Claude — human-verified