Reddit r/MachineLearning·23 May 2026

I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

Signal

Hype

In three linesMamba1 variant called SM1 with d_state=1 using two native PyTorch ops to replace selective scan. Exact closed-form solution, not an approximation. Reduces scan memory 16x versus Mamba1 (d_state=16). Inference state 14 KB for 130M model, O(1) per token. Training on 163K MIDI files (2.5B tokens).

Read source

Your take?

Open source Code generation Reasoning Infrastructure

Summary generated by Claude — human-verified

I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

Other angles on this story