Back to feed
arXiv cs.AI·

CoLLM: Continuous Adaptation for SLO-Aware LLM Serving on Shared GPU Clusters

Signal
72
Hype
18
In three linesCoLLM unifies federated fine-tuning (FL PEFT) and inference on shared edge GPU clusters. The system coordinates real-time parameter sharing via shadow adapter strategies and dynamically balances workloads to optimize model quality and inference latency. Evaluation shows 3x higher goodput vs existing systems.
Read source
Your take?
Fine-tuningInfrastructure

Summary generated by Claude — human-verified