arXiv cs.AI·19 May 2026

Probing Persona-Dependent Preferences in Language Models

Signal

Hype

In three linesStudy of internal preferences in LLMs using linear probes on residual-stream activations. Researchers identify a shared preference vector in Gemma-3-27B and Qwen-3.5-122B that predicts and causally controls model choices. This vector remains stable even when the model adopts radically different personas (helpful assistant vs evil persona).

Read source

Your take?

Gemini Qwen Reasoning Alignment Papers

Summary generated by Claude — human-verified

Probing Persona-Dependent Preferences in Language Models

Other angles on this story