Back to feed
arXiv cs.AI·

Probing Persona-Dependent Preferences in Language Models

Signal
75
Hype
15
In three linesStudy of internal preferences in LLMs using linear probes on residual-stream activations. Researchers identify a shared preference vector in Gemma-3-27B and Qwen-3.5-122B that predicts and causally controls model choices. This vector remains stable even when the model adopts radically different personas (helpful assistant vs evil persona).
Read source
Your take?
GeminiQwenReasoningAlignmentPapers

Summary generated by Claude — human-verified