Back to feed
arXiv cs.CL·

Probing Persona-Dependent Preferences in Language Models

Signal
78
Hype
25
In three linesResearchers identify a shared preference vector in Gemma-3-27B and Qwen-3.5-122B by training linear probes on residual-stream activations. This vector predicts and causally controls the model's task choices across different personas, including an evil persona, revealing a largely shared preference representation underlying different behavioral modes.
Read source
Your take?
GeminiQwenReasoningAlignmentPapers

Summary generated by Claude — human-verified