Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs
CSA (Conformal Selective Acting) is a deployment wrapper for RLVR-fine-tuned LLMs guaranteeing per-round risk control without pooling across deployments. Tested on 480 specialist streams and 10,300 Expert-Iteration rounds with LoRA, CSA maintains a Ville e-process per threshold and achieves selective-risk bound R_T^act ≤ α+O(N_T^{-1/2}) with anytime pathwise validity.