Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM)
Signal
72
Hype
25
In three linesKrasis v1.0, LLM runtime for models exceeding VRAM, achieves 12.48 tokens/s on RTX 3070 Mobile 8GB with Qwen3.6-35B-A3B (Q4). Full Rust implementation (no Python in hot path) and separate prefill/decode optimizations. Benchmarks: 222 pp, 12.48 tg on laptop; 10,030 pp, 124.9 tg on RTX 5090 32GB.Read source
Your take?
Summary generated by Claude — human-verified