Evaluating Language Model Bias with 🤗 Evaluate
Hugging Face releases an evaluation tool to measure bias in language models. The Evaluate platform enables systematic testing of gender, racial, and other biases in LLMs through standardized benchmarks.
10 articles
Hugging Face releases an evaluation tool to measure bias in language models. The Evaluate platform enables systematic testing of gender, racial, and other biases in LLMs through standardized benchmarks.
Hugging Face publishes a guide on progressing from PyTorch DDP to Accelerate to Trainer for mastering distributed training. The educational pathway demonstrates how to progressively abstract away complexity in multi-GPU/multi-node training setups.
OpenAI publishes research on scaling laws for reward model overoptimization. Researchers quantify performance degradation when excessively optimizing a reward function, with implications for reinforcement learning training and model alignment.
Hugging Face releases MTEB, a massive benchmark for evaluating text embedding models. Covers 58 languages, 8 tasks (retrieval, clustering, classification, etc.) and 112 datasets. Enables systematic comparison of embedding model performance.
Hugging Face releases a getting started guide for Inference Endpoints, enabling users to deploy and serve models in production through a managed API without infrastructure management.
Stable Diffusion is now available in JAX/Flax. This implementation provides improved performance and scalability for inference and training on TPU and GPU. Code is open-source on Hugging Face.
Hugging Face documents optimization of BLOOM model inference. The article details techniques applied to reduce latency and increase throughput, including quantization, batching, and hardware optimizations.
Hugging Face introduces DOI (Digital Object Identifier) for datasets and models on its platform. DOIs enable standardized academic citation and permanent traceability of AI resources, facilitating reproducibility and scientific integrity.
Stability AI and Hugging Face release a Japanese-optimized version of Stable Diffusion for generating images from Japanese text prompts. The model improves output quality for Japanese text and characters.
Hugging Face publishes guidance on evaluating very large language models. The article covers methodologies, benchmarks, and challenges specific to massive-scale LLMs, though specific technical details are absent from the excerpt.