arXiv cs.CL·1 June 2026

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Signal

Hype

In three linesResearchers reveal that statistical watermarks in LLMs are vulnerable to linear ensembles. Averaging probability distributions across 3-5 models cancels out watermark perturbations. WASH (Watermark Attenuation via Statistical Hybridisation) defeats detection across 6 watermarking schemes, reducing z-scores from 5-300 to <2 (threshold: 4), while improving output quality by 27.5%.

Read source

Your take?

AI safety Alignment Papers Benchmarks

Summary generated by Claude — human-verified

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Other angles on this story