Back to feed
Reddit r/LocalLLaMA·

Granite 4.1 Architecture Changes?

Signal
35
Hype
15
In three linesA r/LocalLLaMA user questions IBM's decision to return to pure transformer architecture for Granite 4.1, abandoning Granite 4's hybrid mamba-attention design. On modest hardware (8GB VRAM), Granite 4 delivered 128k context at ~1000 tok/s ingestion, while Granite 4.1 caps at 14k context and ~300 tok/s. User asks whether IBM will continue offering mamba architecture.
Read source
Your take?
Open sourceReasoning

Summary generated by Claude — human-verified