CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
Signal
78
Hype
15
In three linesCODA is a GPU kernel abstraction that rewrites Transformer blocks as GEMM-epilogue programs. It fuses memory-bound operations (normalization, activations, residuals) with GEMM output before writing to memory, reducing data movement. Covers nearly all non-attention computation in forward/backward pass.Read source
Your take?
Summary generated by Claude — human-verified