Back to feed
Hacker News (AI)·

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

Signal
35
Hype
15
In three linesTiny-vLLM is a high-performance LLM inference engine written in C++ and CUDA. Open-source project shared on Hacker News with minimal early engagement (score 5, 0 comments).
Read source
Your take?
InfrastructureOpen sourceCode generation

Summary generated by Claude — human-verified