Back to feed
Reddit r/LocalLLaMA·

Opinions/improvements for my Qwen3.6-35B-A3B-FP8 + Hermes Agent setup on NVIDIA DGX Spark?

Signal
35
Hype
15
In three linesUser deploys Qwen3.6-35B-A3B-FP8 with Hermes Agent on NVIDIA DGX Spark via vLLM. Setup: 262k token context, FP8 KV-cache, FlashInfer, prefix-caching, chunked-prefill, speculative decoding (Qwen3 MTP). Seeks feedback on stability and optimizations.
Read source
Your take?
QwenAI AgentsInfrastructureCode generation

Summary generated by Claude — human-verified