Back to feed
Reddit r/LocalLLaMA·

Latest b9274 Addresses MTP VRAM leak

Signal
72
Hype
15
In three linesCommit b9274 fixes a VRAM leak in MTP (Multi-Token Prediction) models. The destroy() function failed to free speculative decoder, draft context, and draft model resources, causing memory accumulation on each sleep/resume cycle. Fix explicitly resets these components before llama_init.
Read source
Your take?
LlamaCode generationInfrastructure

Summary generated by Claude — human-verified