Back to feed
Reddit r/MachineLearning·

I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

Signal
72
Hype
25
In three linesMamba1 variant called SM1 with d_state=1 using two native PyTorch ops to replace selective scan. Exact closed-form solution, not an approximation. Reduces scan memory 16x versus Mamba1 (d_state=16). Inference state 14 KB for 130M model, O(1) per token. Training on 163K MIDI files (2.5B tokens).
Read source
Your take?
Open sourceCode generationReasoningInfrastructure

Summary generated by Claude — human-verified