Back to feed
Reddit r/LocalLLaMA·

nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face

Signal
75
Hype
15
In three linesNVIDIA quantized Alibaba's Qwen3.6-35B-A3B model to NVFP4 (4-bit) using Model Optimizer. Weight reduction from 16 to 4 bits per parameter cuts GPU memory and disk size by ~3.06x. Benchmark results show minimal accuracy loss: MMLU Pro 85.6→85.0, GPQA Diamond 84.9→84.8.
Read source
Your take?
QwenFine-tuningBenchmarksInfrastructure

Summary generated by Claude — human-verified