Reddit r/LocalLLaMA·25 May 2026

Update on 12x32gb sxm v100 cluster / local AI for legal drafting

Signal

Hype

In three linesA lawyer shares experience running a 12 V100-SXM2 32GB cluster for local legal document drafting. After abandoning vLLM due to GPU Volta incompatibility with MoE models, he switched to llama.cpp with Gemma-4-26B and Qwen3.5-122B. Dense models on V100 are inefficient (~20-28 tok/s); MoE models achieve 50-113 tok/s decode on long-context legal prompts.

Read source

Your take?

Llama Open source Infrastructure Code generation Reasoning

Summary generated by Claude — human-verified

Update on 12x32gb sxm v100 cluster / local AI for legal drafting

Other angles on this story