🚀 Accelerating LLM Inference with TGI on Intel Gaudi
Hugging Face accelerates LLM inference using Text Generation Inference (TGI) on Intel Gaudi processors. The solution optimizes latency and throughput for production deployments.
Hugging Face accelerates LLM inference using Text Generation Inference (TGI) on Intel Gaudi processors. The solution optimizes latency and throughput for production deployments.
Gradio releases an updated Dataframe component with improved interface for tabular data manipulation and visualization. The component includes native inline editing, sorting, filtering, and export capabilities without additional code.
Hugging Face rolls out new analytics tools for Inference Endpoints, enabling users to monitor performance, latency, and resource usage in real-time.
NVIDIA announces at GTC 2025 new open models and datasets for Physical AI developers. The initiative aims to accelerate research in robotics and autonomous systems through open-source resources.
Xet, a version control tool for large data and ML models, is now available on Hugging Face Hub. Native integration to version large files, datasets, and checkpoints seamlessly.
Hugging Face releases Open R1 Update #3, advancing its open-source reasoning model. The update covers improvements in reasoning capabilities and model enhancements since previous versions.
Practical guide to running LLMs on mobile devices via React Native. Hugging Face demonstrates how to deploy language models directly on phones without server connectivity, including code examples and performance benchmarks.
Hugging Face and JFrog partner to enhance AI security transparency. The partnership aims to integrate security and compliance tools into the Hugging Face ecosystem, enabling developers to better assess model and dependency risks.
Arize Phoenix enables tracing and evaluation of AI agents. The tool provides visibility into API calls, agent decisions, and performance metrics. Integration with popular frameworks for production monitoring.
Mercari integrates GPT-4o mini and GPT-4 to enhance product listings and assist sellers. New features include AI Listing Support and Mercari AI Assistant, designed to boost sales on the marketplace platform.
Endex builds an autonomous financial analyst using OpenAI's o1 and o3-mini reasoning models. The models enable advanced financial analysis without manual intervention.
Hugging Face integrates remote VAEs (Variational Autoencoders) into Inference Endpoints for decoding. This feature enables using remotely hosted VAE models without local loading, optimizing resource usage and latency.
Hugging Face releases SigLIP 2, an improved multilingual vision-language encoder. The model delivers better performance on vision and multilingual understanding tasks compared to its predecessor.
Hugging Face adds three new serverless inference providers: Hyperbolic, Nebius AI Studio, and Novita. These integrations expand model deployment options on the Hugging Face platform.
Hugging Face fixes its Open LLM Leaderboard by integrating Math-Verify, a mathematical verification method to more accurately evaluate language models' reasoning capabilities. This improvement addresses limitations of previous metrics.
OpenAI releases an updated Model Spec, the document defining expected behaviors for its models. This specification guides development and evaluation of capabilities and safety boundaries.
Hugging Face optimizes uploads and downloads on the Hub by replacing chunks with blocks. This architecture reduces latency and improves stability for large file transfers.
Hugging Face releases Open R1 Update #2, advancing its open-source reasoning model. The update improves performance and reasoning capabilities on complex tasks.
OpenAI partners with Schibsted Media Group to integrate Guardian news and archive content into ChatGPT. Media content distribution partnership.
OpenAI introduces data residency in Europe, strengthening its enterprise-grade data privacy, security, and compliance programs for customers worldwide.
OpenAI deploys its latest reasoning models to U.S. National Laboratories to accelerate scientific breakthroughs.
Hugging Face publishes a guide for deploying and fine-tuning DeepSeek models on AWS. The tutorial covers cloud infrastructure, resource optimization, and practical fine-tuning steps on the AWS platform.
Hugging Face publishes a state-of-the-art overview of open-source video generation models integrated in Diffusers. The article covers architectures, performance metrics, and use cases of major available models.
KVPress is a key-value cache compression technique for LLMs that reduces memory usage without performance degradation on long contexts. Hugging Face presents the method and its integration into models.
Hugging Face and FriendliAI partner to optimize model deployment on the Hub. The partnership aims to improve inference performance and model accessibility through integration of FriendliAI's technology.
OpenAI announces Stargate, an AGI infrastructure project with strategic partners. The initiative aims to mobilize the industrial ecosystem: data centers, power, land, construction, and equipment.
Hugging Face analyzes the correlation between CO₂ emissions and model performance on the Open LLM Leaderboard. The study shows larger models consume more energy without proportional performance gains, questioning the energy efficiency of LLM training.
Hugging Face publishes a guide to visualize and understand GPU memory usage in PyTorch. The article provides tools and techniques to diagnose memory bottlenecks when training models.
Hugging Face introduces ModernBERT, a model designed to replace BERT by incorporating modern architectures. The model improves performance on classification and text representation tasks with increased computational efficiency.
Hugging Face publishes benchmark results for language model performance on 5th Gen Xeon processors on GCP. Evaluates latency and throughput across different model architectures.
Hugging Face introduces a Synthetic Data Generator enabling dataset creation through natural language. The tool automates training data production without costly manual annotation.
Analysis of 78 election deepfakes: political misinformation is not primarily an AI problem. Electoral manipulation issues predate the technology and cannot be solved by technical solutions alone.
Hugging Face launches LeMaterial, an open source initiative to accelerate materials discovery. The project combines AI models and public datasets to help researchers and industry explore new materials faster.
Hugging Face releases an open preference dataset for text-to-image generation built by the community. The dataset includes human preference annotations for training and evaluating image generation models.
Sora is OpenAI's video generation model that produces videos from text, image, or video inputs. It builds on learnings from DALL-E and GPT models to provide expanded tools for storytelling and creative expression.
Hugging Face tests LLMs' ability to fix their own mistakes through a chatbot arena experiment using Keras and TPUs. The study evaluates whether models can identify and repair incorrect responses without external intervention.
Hugging Face presents a case study on fine-tuning small models using insights from large LLMs. The CFM (Contrastive Fine-tuning Method) improves compact model performance without significant computational cost increases.
Hugging Face releases a guide for open-source developers on EU AI Act compliance. The document covers legal obligations, risk categories, and implications for open-source AI models and systems.
Hugging Face rearchitects its uploads and downloads infrastructure. The platform optimizes storage and file transfer systems to improve performance and reliability for model and dataset operations.
Hugging Face introduces SmolVLM, a compact yet performant vision-language model. The model combines computational efficiency with advanced multimodal capabilities for image understanding and text tasks.