Reddit r/LocalLLaMA·2 June 2026

Using Gemma 4 E4B with the LiteRT engine - ~2.4x speedup over Q4 GGUF in text generation, image processing roughly the same

Signal

Hype

In three linesGemma 4 E4B in Google's LiteRT format achieves 157.2 tok/s text generation, 2.4× faster than Q4 GGUF (66.3 tok/s) via multi-token prediction (MTP). Image captioning shows only 1.1× speedup as vision encoder is the bottleneck. Tested on RTX 4060 Ti 16GB.

Read source

Your take?

Gemini Code generation Vision Benchmarks Tools

Summary generated by Claude — human-verified

Using Gemma 4 E4B with the LiteRT engine - ~2.4x speedup over Q4 GGUF in text generation, image processing roughly the same

Other angles on this story