arXiv cs.AI·25 May 2026

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

Signal

Hype

In three linesPCSP, a single RL policy conditioned on frozen LLM embeddings, controls 300 NPCs with distinct personas. Achieves 17x above-chance zero-shot identification, ρ=0.73 semantic-behavioral alignment, 22x faster than LLM-as-policy baseline. Deployed in UE5 on 64 agents with low failure rate.

Read source

Your take?

Reinforcement learning AI Agents Multi-agent Benchmarks

Summary generated by Claude — human-verified

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

Other angles on this story