Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

ArXi:2605.20270v1 Announce Type: new A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $\alpha$. The operator needs a safety certificate for this deployment's stream at every round: no pooling across deployments, no waiting for a long-run average.