Reddit r/LocalLLaMA·21 May 2026

Latest b9274 Addresses MTP VRAM leak

Signal

Hype

In three linesCommit b9274 fixes a VRAM leak in MTP (Multi-Token Prediction) models. The destroy() function failed to free speculative decoder, draft context, and draft model resources, causing memory accumulation on each sleep/resume cycle. Fix explicitly resets these components before llama_init.

Read source

Your take?

Llama Code generation Infrastructure

Summary generated by Claude — human-verified

Latest b9274 Addresses MTP VRAM leak

Other angles on this story