Back to feed
arXiv cs.CL·

Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs

Signal
82
Hype
25
In three linesResearchers reveal that statistical watermarks in LLMs are vulnerable to linear ensembles. Averaging probability distributions across 3-5 models cancels out watermark perturbations. WASH (Watermark Attenuation via Statistical Hybridisation) defeats detection across 6 watermarking schemes, reducing z-scores from 5-300 to <2 (threshold: 4), while improving output quality by 27.5%.
Read source
Your take?
AI safetyAlignmentPapersBenchmarks

Summary generated by Claude — human-verified