Geometry-Lite: Interpretable Safety Probing via Layer-Wise Margin Geometry
Signal
78
Hype
15
In three linesGeometry-Lite is a compact safety probe analyzing hidden-state geometry across LLM layers (1.2B–70B). It maps layer-wise margins via centroid, local-neighborhood, and supervised linear-boundary readouts, showing that unsafe-prompt detection relies primarily on persistent margin geometry rather than layer-to-layer motion signals.Read source
Your take?
Summary generated by Claude — human-verified