Back to feed
arXiv cs.LG·

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

Signal
78
Hype
25
In three linesGEM (Geometric Entropy Mixing) reformulates LLM data curation as a variational problem on the hypersphere to prevent cluster collapse. Uses provable MM algorithm and teacher-student distillation for web-scale scaling. Improves downstream accuracy by up to 1.2% on 1.1B models integrated with DoReMi and RegMix.
Read source
Your take?
PapersBenchmarksFine-tuningEmbeddings

Summary generated by Claude — human-verified