CodeAlchemy: Synthetic Code Rewriting at Scale
Signal
82
Hype
25
In three linesCodeAlchemy generates 500B+ synthetic tokens via 5 strategies (CodeEnhance, CodeQA, CodeDev, CodeDialogue, CodeTrace) from public code across 15 languages. CodeTrace instruments 1.3M+ files to capture control flow and library knowledge. 3B models outperform 10x larger models (Gemma-3 27B, Granite-4.0 32B): 83.5% HumanEval, 63.2% MBPP.Read source
Your take?
Summary generated by Claude — human-verified