Reddit r/LocalLLaMA·28 May 2026

Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM)

Signal

Hype

In three linesKrasis v1.0, LLM runtime for models exceeding VRAM, achieves 12.48 tokens/s on RTX 3070 Mobile 8GB with Qwen3.6-35B-A3B (Q4). Full Rust implementation (no Python in hot path) and separate prefill/decode optimizations. Benchmarks: 222 pp, 12.48 tg on laptop; 10,030 pp, 124.9 tg on RTX 5090 32GB.

Read source

Your take?

Qwen Infrastructure Open source Code generation

Summary generated by Claude — human-verified

Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM)

Other angles on this story