Reddit r/LocalLLaMA·28 May 2026

Granite 4.1 Architecture Changes?

Signal

Hype

In three linesA r/LocalLLaMA user questions IBM's decision to return to pure transformer architecture for Granite 4.1, abandoning Granite 4's hybrid mamba-attention design. On modest hardware (8GB VRAM), Granite 4 delivered 128k context at ~1000 tok/s ingestion, while Granite 4.1 caps at 14k context and ~300 tok/s. User asks whether IBM will continue offering mamba architecture.

Read source

Your take?

Open source Reasoning

Summary generated by Claude — human-verified

Granite 4.1 Architecture Changes?

Other angles on this story