Experts first llama.cpp
Signal
65
Hype
25
In three linesExperimental llama.cpp fork optimizing MoE for 12GB VRAM GPUs. Author selectively loads experts to VRAM instead of full layers, reaching 26 tk/s on RTX 2060 (vs 19 tk/s default) with 62% hit rate. Seeking testers on 3060/4060.Read source
Your take?
Summary generated by Claude — human-verified