Back to feed
arXiv cs.CL·

CodeAlchemy: Synthetic Code Rewriting at Scale

Signal
82
Hype
25
In three linesCodeAlchemy generates 500B+ synthetic tokens via 5 strategies (CodeEnhance, CodeQA, CodeDev, CodeDialogue, CodeTrace) from public code across 15 languages. CodeTrace instruments 1.3M+ files to capture control flow and library knowledge. 3B models outperform 10x larger models (Gemma-3 27B, Granite-4.0 32B): 83.5% HumanEval, 63.2% MBPP.
Read source
Your take?
Code generationBenchmarksFine-tuningPapers

Summary generated by Claude — human-verified