Back to feed
Reddit r/LocalLLaMA·

Dynamic KV Cache Quantization and Load-on-demand mmproj/MTP: my llama.cpp wishlist

Signal
65
Hype
25
In three linesDeveloper proposes optimization for llama.cpp: dynamic KV cache quantization and on-demand mmproj loading. PoC implementation with HTTP endpoint /requantize_kvcache enabling config switching (quantized/f16 kvcache, mmproj on/off) without full model reload. Tested on RTX 5090 with Qwen3.5-27B Q6_K.
Read source
Your take?
LlamaInfrastructureOpen source

Summary generated by Claude — human-verified