Reddit r/LocalLLaMA·23 May 2026

Llama.cpp VS LiteRT on a custom Xiaomi 12 Pro 24/7 Server (V2 Redesign)

Signal

Hype

In three linesBenchmark llama.cpp vs LiteRT (Google) on custom 24/7 server using Xiaomi 12 Pro (Snapdragon 8 Gen 1). Llama.cpp: 30.6 t/s prompt, 5.7 t/s generation, moderate CPU load. LiteRT: slightly faster generation but maxes CPU and higher power draw. Setup features copper/aluminum cooling, custom safe PSU, 3D-printed case.

Read source

Your take?

Llama Open source Benchmarks Infrastructure

Summary generated by Claude — human-verified

Llama.cpp VS LiteRT on a custom Xiaomi 12 Pro 24/7 Server (V2 Redesign)

Other angles on this story