arXiv cs.LG·29 May 2026

Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules

Signal

Hype

In three linesKOFF decomposes LLMs into sparse shared backbones and domain-specific external memory modules. On Llama and Qwen (3B-8B), the framework preserves performance at 12% global sparsity using LoRA adapters and learned KV caches, while pruning without memories degrades sharply.

Read source

Your take?

Llama Qwen Fine-tuning Papers

Summary generated by Claude — human-verified

Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules

Other angles on this story