Back to feed
Reddit r/LocalLLaMA·

Using Gemma 4 E4B with the LiteRT engine - ~2.4x speedup over Q4 GGUF in text generation, image processing roughly the same

Signal
72
Hype
28
In three linesGemma 4 E4B in Google's LiteRT format achieves 157.2 tok/s text generation, 2.4× faster than Q4 GGUF (66.3 tok/s) via multi-token prediction (MTP). Image captioning shows only 1.1× speedup as vision encoder is the bottleneck. Tested on RTX 4060 Ti 16GB.
Read source
Your take?
GeminiCode generationVisionBenchmarksTools

Summary generated by Claude — human-verified