Reddit r/LocalLLaMA·20 mai 2026

Do you think there is room for optimization? llama.cpp/qwen3.6 27b on two 6000 Blackwell

Signal

Hype

En 3 lignesUtilisateur exécute Qwen3.6-27B via llama.cpp sur deux GPU Blackwell 6000 MaxQ avec AMD Epyc, obtenant 100-110 t/s. Cherche optimisations : cartes à 250/300W, 20GB VRAM disponible. Configuration inclut flash-attention, speculative decoding (draft-MTP), batch 6144, contexte 1M.

Lire la source

Ton avis ?

Llama Open source Génération de code Infrastructure

Résumé généré par Claude — vérifié par l'humain

Do you think there is room for optimization? llama.cpp/qwen3.6 27b on two 6000 Blackwell

Autres angles sur ce sujet