Reddit r/LocalLLaMA·11 June 2026

Reviewing speed optimizations on llamacpp for large MoE models on multiGPU rigs? (fitparams vs -ngl/-ncmoe vs other flags, P2P, overclocking)

Signal

Hype

In three linesDiscussion on speed optimizations for llama.cpp with MoE models on multi-GPU setups. Author explores -ngl, -ncmoe, -fitt, -ub flags and their impact on throughput (50→120 tps in prompt processing). Questions practical relevance of these optimizations for AI career prospects.

Read source

Your take?

Llama Open source Infrastructure Code generation

Summary generated by Claude — human-verified

Reviewing speed optimizations on llamacpp for large MoE models on multiGPU rigs? (fitparams vs -ngl/-ncmoe vs other flags, P2P, overclocking)

Other angles on this story