Back to feed
arXiv cs.LG·

Knowledge Offloading: Decomposing LLMs into Sparse Backbones and Memory Modules

Signal
72
Hype
18
In three linesKOFF decomposes LLMs into sparse shared backbones and domain-specific external memory modules. On Llama and Qwen (3B-8B), the framework preserves performance at 12% global sparsity using LoRA adapters and learned KV caches, while pruning without memories degrades sharply.
Read source
Your take?
LlamaQwenFine-tuningPapers

Summary generated by Claude — human-verified