BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.
Signal
82
Hype
25
In three linesBeeLlama v0.2.0 delivers major performance gains with DFlash optimization. On RTX 3090: Qwen 3.6 27B reaches 164 tps (4.40x speedup), Gemma 4 31B 177.8 tps (4.93x). Full Gemma 4 31B support, reduced DFlash overhead, improved prefill handling, stricter draft/target validation.Read source
Your take?
Summary generated by Claude — human-verified