Back to feed
arXiv cs.LG·

Residual Paving: Diagnosing the Routing Bottleneck in Selective Refusal Editing

Signal
72
Hype
18
In three linesResidual Paving is a routed residual editing method for frozen transformers that decouples route selectivity (whether to intervene) from residual-edit capacity (what edit to apply). On Gemma-3-4B-IT, it reduces edit refusal from 88.6% to 4.0% while preserving 95.5% benign behavior and 87.3% harmful refusals.
Read source
Your take?
AI safetyAlignmentFine-tuningPapers

Summary generated by Claude — human-verified