Back to feed
arXiv cs.CL·

Neuron-Level Interventions for Gendered and Gender-Neutral Generation in Language Models

Signal
75
Hype
15
In three linesStudy of gender-specific neurons in language models (feminine, masculine, gender-neutral). Authors propose neuron-level intervention method to identify and control gendered language generation. Experiments on two open-source LMs show gender neurons concentrate in early layers. Code and datasets released.
Read source
Your take?
PapersAlignmentAI safetyEvals

Summary generated by Claude — human-verified