Back to feed
Reddit r/LocalLLaMA·

Heterogeneous GPU Weighting & Layer Splitting

Signal
65
Hype
25
In three linesHeterogeneous GPU load balancing optimization for Ollama (RTX 5090 + 3090). Custom implementation weights layer distribution by compute power (SMCount × ClockMHz) instead of free memory alone. Result: faster than RTX 5090 standalone, leverages 3090 VRAM without bottlenecking the 5090.
Read source
Your take?
Open sourceInfrastructureLlama

Summary generated by Claude — human-verified