Back to feed
arXiv cs.CL·

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

Signal
82
Hype
15
In three linesFirst end-to-end RAG pipeline running all neural stages on mobile NPU (Snapdragon X Elite Hexagon). Embedding, reranking, LLM generation on-device. On 120-query Wikipedia benchmark: 18.1x faster LLM prefilling, 4.0x lower system energy vs CPU, answer quality parity (GPT-4.1 judge: 9.32 vs 8.95 CPU).
Read source
Your take?
RAGEmbeddings

Summary generated by Claude — human-verified