Using Gemma 4 E4B with the LiteRT engine - ~2.4x speedup over Q4 GGUF in text generation, image processing roughly the same
Gemma 4 E4B in Google's LiteRT format achieves 157.2 tok/s text generation, 2.4× faster than Q4 GGUF (66.3 tok/s) via multi-token prediction (MTP). Image captioning shows only 1.1× speedup as vision encoder is the bottleneck. Tested on RTX 4060 Ti 16GB.