Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs
Signal
82
Hype
25
In three linesResearchers reveal that statistical watermarks in LLMs are vulnerable to linear ensembles. Averaging probability distributions across 3-5 models cancels out watermark perturbations. WASH (Watermark Attenuation via Statistical Hybridisation) defeats detection across 6 watermarking schemes, reducing z-scores from 5-300 to <2 (threshold: 4), while improving output quality by 27.5%.Read source
Your take?
Summary generated by Claude — human-verified