Scaling-up BERT Inference on CPU (Part 1)
Hugging Face publishes a guide on optimizing BERT inference on CPU. First part of a series exploring scaling techniques to improve performance without GPU.
3 articles
Hugging Face publishes a guide on optimizing BERT inference on CPU. First part of a series exploring scaling techniques to improve performance without GPU.
Hugging Face releases Accelerate, a library for training and inference on multiple GPUs/TPUs without code changes. PyTorch-compatible, it simplifies distributed computing and resource optimization.
Hugging Face releases a guide for distributed training of BART and T5 on Amazon SageMaker. Uses HF Transformers with multi-GPU/multi-node optimizations for summarization tasks. Includes code, benchmarks, and best practices.