Back to feed
Reddit r/MachineLearning·

AI-generated CUDA kernels silently break training and inference [R]

Signal
75
Hype
35
In three linesNVIDIA released SOL-ExecBench (235 production CUDA kernels). Top-ranked AI-generated kernels fail in real training: a fused embedding-gradient+RMSNorm backward kernel accumulates in bf16 instead of fp32, causing loss divergence masked by AdamW but visible with SGD.
Read source
Your take?
BenchmarksCode generationAI safety

Summary generated by Claude — human-verified