Reddit r/LocalLLaMA·22 May 2026

BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.

Signal

Hype

In three linesBeeLlama v0.2.0 delivers major performance gains with DFlash optimization. On RTX 3090: Qwen 3.6 27B reaches 164 tps (4.40x speedup), Gemma 4 31B 177.8 tps (4.93x). Full Gemma 4 31B support, reduced DFlash overhead, improved prefill handling, stricter draft/target validation.

Read source

Your take?

Qwen Open source Code generation Benchmarks Infrastructure

Summary generated by Claude — human-verified

BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.

Other angles on this story