Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs
Signal
78
Hype
15
In three linesCSA (Conformal Selective Acting) is a deployment wrapper for RLVR-fine-tuned LLMs guaranteeing per-round risk control without pooling across deployments. Tested on 480 specialist streams and 10,300 Expert-Iteration rounds with LoRA, CSA maintains a Ville e-process per threshold and achieves selective-risk bound R_T^act ≤ α+O(N_T^{-1/2}) with anytime pathwise validity.Read source
Your take?
Summary generated by Claude — human-verified