Back to feed
Reddit r/LocalLLaMA·

Benchmarks of 20 small LLMs on a 6GB RTX 4050

Signal
72
Hype
18
In three linesBenchmark of 20 small LLMs on RTX 4050 6GB GPU. Author tests Q4/Q6 GGUF quantizations with 6 qualitative probes (tool-call, strict JSON, plan decomposition, no path hallucination) instead of full suites, measuring prefill speed and generation at 1k/8k/32k tokens to identify viable models for local inference on constrained hardware.
Read source
Your take?
BenchmarksOpen sourceCode generationAI AgentsTools

Summary generated by Claude — human-verified