AI RESEARCH
Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs
arXiv CS.CL
•
ArXi:2605.30501v1 Announce Type: new Watermarking embeds statistical signatures in AI-generated text for detection and attribution. We reveal a fundamental vulnerability: when users access multiple models (today's reality), watermarks trivially fail. Watermarks perturb output distributions away from the original, and in competitive markets, these perturbations are typically independent across providers. We theoretically prove that averaging output probability distributions recovers the unwatermarked distribution with up to a second-order error term.