arXiv cs.AI·28 May 2026

Cross-Entropy Games and Frost Training

Signal

Hype

In three linesFrost Training improves Monte Carlo-based policy optimization for LLM-as-a-judge tasks called Cross-Entropy Games. The method exploits reward function gradients in embedding space, a signal borrowed from GCG jailbreaking. Validated with GRPO training, it increases the model's ability to generate high-scoring outputs faster and reaches higher maximum scores in best-of-k settings.

Read source

Your take?

Reinforcement learning Reasoning Evals AI safety

Summary generated by Claude — human-verified

Cross-Entropy Games and Frost Training

Other angles on this story