Reddit r/LocalLLaMA·22 May 2026

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Signal

Hype

In three linesCODA is a GPU kernel abstraction that rewrites Transformer blocks as GEMM-epilogue programs. It fuses memory-bound operations (normalization, activations, residuals) with GEMM output before writing to memory, reducing data movement. Covers nearly all non-attention computation in forward/backward pass.

Read source

Your take?

Infrastructure Benchmarks Code generation

Summary generated by Claude — human-verified

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Other angles on this story