arXiv cs.CL·19 May 2026

Probing Persona-Dependent Preferences in Language Models

Signal

Hype

In three linesResearchers identify a shared preference vector in Gemma-3-27B and Qwen-3.5-122B by training linear probes on residual-stream activations. This vector predicts and causally controls the model's task choices across different personas, including an evil persona, revealing a largely shared preference representation underlying different behavioral modes.

Read source

Your take?

Gemini Qwen Reasoning Alignment Papers

Summary generated by Claude — human-verified

Probing Persona-Dependent Preferences in Language Models

Other angles on this story