arXiv cs.LG·21 May 2026

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

Signal

Hype

In three linesCSA (Conformal Selective Acting) is a deployment wrapper for RLVR-fine-tuned LLMs guaranteeing per-round risk control without pooling across deployments. Tested on 480 specialist streams and 10,300 Expert-Iteration rounds with LoRA, CSA maintains a Ville e-process per threshold and achieves selective-risk bound R_T^act ≤ α+O(N_T^{-1/2}) with anytime pathwise validity.

Read source

Your take?

Reinforcement learning AI safety Evals Regulation

Summary generated by Claude — human-verified

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

Other angles on this story