Back to feed
arXiv cs.LG·

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

Signal
78
Hype
15
In three linesCSA (Conformal Selective Acting) is a deployment wrapper for RLVR-fine-tuned LLMs guaranteeing per-round risk control without pooling across deployments. Tested on 480 specialist streams and 10,300 Expert-Iteration rounds with LoRA, CSA maintains a Ville e-process per threshold and achieves selective-risk bound R_T^act ≤ α+O(N_T^{-1/2}) with anytime pathwise validity.
Read source
Your take?
Reinforcement learningAI safetyEvalsRegulation

Summary generated by Claude — human-verified