Holo3.1: Fast & Local Computer Use Agents
Hugging Face releases Holo3.1, a fast local computer use agent for task automation. The model runs on-device without cloud dependency, enabling speed and privacy for system-level actions.
Hugging Face releases Holo3.1, a fast local computer use agent for task automation. The model runs on-device without cloud dependency, enabling speed and privacy for system-level actions.
JetBrains releases Mellum2, a 12B Mixture-of-Experts model. The model combines computational efficiency with performance, designed for code and reasoning tasks.
Hugging Face argues that enterprise AI adoption beyond LLMs requires scalable agent logic. The article explores how multi-agent systems and orchestration become critical for deploying AI beyond simple use cases.
NVIDIA releases Cosmos 3, an open omni-model for physical AI that reasons and acts. The model processes video, text, and images to understand real-world physics and generate robotic actions.
Beginner's guide to PyTorch profiling using torch.profiler. Covers how to measure performance and identify bottlenecks in AI models, with practical examples for newcomers.
ITBench-AA, a new benchmark from Artificial Analysis and IBM, evaluates frontier models on agentic enterprise IT tasks. Top models (Claude, GPT-4, Gemini) score below 50%, exposing significant gaps in automating complex IT workflows.
Hugging Face introduces Delta Weight Sync in TRL to optimize deployment of trillion-parameter models. The technique syncs only weight changes rather than full models, drastically reducing storage and bandwidth requirements for updates.
Reachy Mini, Pollen Robotics' humanoid robot, now runs fully locally without cloud dependency. Integrates open-source models (Llama, Whisper) for vision, speech, and motor control. Deployed on embedded hardware.
Hugging Face clarifies AI agent terminology: distinguishing harness (execution infrastructure), scaffold (coordination structure), and agent (autonomous system). Essential definitions to avoid confusion in the ecosystem.
Nvidia and Hugging Face introduce Nemotron-Labs, diffusion-based language models to accelerate text generation. The approach parallelizes token generation, reducing latency compared to traditional autoregressive methods.
Hugging Face argues that AI model specialization outperforms raw scale in procurement decisions. Organizations typically favor large generalist models, overlooking that smaller specialized models deliver better performance and lower costs for specific tasks.
Hugging Face releases OlmoEarth v1.1, a more efficient family of models for geospatial tasks. The new models deliver improved performance and inference speed compared to the previous version.
Hugging Face introduces the Ettin Reranker family, models designed to improve search relevance and RAG result ranking. These rerankers optimize document ranking after initial retrieval.
Hugging Face releases a guide for fine-tuning NVIDIA Cosmos Predict 2.5, a robot video generation model, using LoRA/DoRA. The method reduces GPU resource requirements while maintaining generation quality for specialized robotics use cases.
PaddleOCR 3.5 integrates a Transformers backend for OCR and document parsing tasks. The new version improves accuracy and flexibility by leveraging Transformers models, enabling better text recognition and structured data extraction.
Hugging Face launches a public leaderboard to evaluate open-source AI agents. The platform ranks models by their ability to complete complex tasks, with reproducible benchmarks and transparent results.
IBM and Hugging Face release Granite Embedding Multilingual R2, an open-source embedding model under Apache 2.0 license. The model supports 32K token context and delivers best-in-class retrieval quality for sub-100M parameter models across multiple languages.
Hugging Face introduces an asynchronicity technique for optimizing continuous batching in inference servers. The method improves throughput by handling requests non-blockingly, reducing latency and increasing GPU resource utilization.
Hugging Face and AWS collaborate to provide optimized building blocks for foundation model training and inference on AWS infrastructure, including SageMaker integrations and open-source tools.
Hugging Face introduces EMO, a pretrained mixture of experts (MoE) model designed to develop emergent modularity. The approach aims to create specialized experts that naturally form during training, improving model efficiency and performance.
vLLM transitions from v0 to v1 prioritizing correctness before optimizations. The update introduces reliability and accuracy improvements in LLM inference, focusing on result validation before applying reinforcement learning techniques.
Hugging Face adds anti-Benchmaxxer filtering to the open ASR leaderboard to prevent artificial benchmark optimization. The system detects models over-optimized for test metrics without real generalization.
IBM and Hugging Face introduce Granite 4.1, a family of language models optimized for enterprise tasks. Models range from 8B to 34B parameters, with multilingual support and enhanced reasoning capabilities.
DeepInfra joins Hugging Face Inference Providers. Integration enables access to models via DeepInfra API directly from the Hugging Face platform.
NVIDIA releases Nemotron 3 Nano Omni, a multimodal model handling documents, audio and video with extended context. Optimized for agents, it unifies vision, voice and text processing in a single architecture.
Hugging Face publishes a guide for building scalable web applications using OpenAI's Privacy Filter. The solution enables processing sensitive data without exposing it directly to OpenAI APIs.
DeepSeek-V4 achieves 1 million token context window usable by agents. The model improves long-context handling and multi-step task processing capabilities.
Hugging Face publishes a guide for integrating Transformers.js into a Chrome extension. The library enables running transformer models directly in the browser without a backend server.
Hugging Face launches QIMMA, a quality-focused Arabic LLM leaderboard. The platform evaluates Arabic language models against rigorous criteria, providing a transparent benchmark for Arabic language performance.
Hugging Face argues that openness in AI is critical for cybersecurity. Open-source models enable better vulnerability detection and increased transparency against threats. The article advocates against closed proprietary approaches.
Hugging Face introduces Ecom-RLVE, an adaptive verifiable environment for training e-commerce conversational agents. The system uses reinforcement learning with simulated environments to optimize customer interactions and conversion rates.
Sentence Transformers adds native support for multimodal models (text + image) for training and fine-tuning embeddings and rerankers. The update includes code examples, pre-trained models, and performance benchmarks.
Hugging Face analyzes VAKRA, a framework for agents exploring reasoning, tool use, and failure modes. Study of multi-agent capabilities and limitations with focus on robustness and error cases.
Hugging Face releases Waypoint-1.5, a 3D interactive simulation model optimized for consumer GPUs. The system generates high-fidelity controllable virtual worlds in real-time, reducing computational requirements compared to previous versions.
Hugging Face releases multimodal embedding and reranker models integrated with Sentence Transformers. These models combine text and images to enhance semantic search and relevance ranking in RAG systems.
Safetensors, the model serialization format created by Hugging Face, officially joins the PyTorch Foundation. This integration strengthens standardization of storage formats for deep learning models.
Google DeepMind releases Gemma 4, a frontier multimodal model handling text, images, and video. Optimized for on-device inference, it delivers advanced reasoning capabilities and outperforms competing models on key benchmarks.
Hugging Face introduces Falcon Perception, a multimodal vision model capable of processing images and videos. The model combines visual perception and language understanding for multimodal analysis tasks.
Gradio now enables building custom frontends using its backend independently of the default interface. Developers can create custom frontends in JavaScript/React while leveraging existing Gradio infrastructure.
IBM and Hugging Face release Granite 4.0 3B Vision, a compact 3-billion-parameter multimodal model optimized for enterprise document analysis. The model combines vision and language capabilities to process images and text, designed for efficient deployment on resource-constrained hardware.
Hugging Face trained language models on mRNA sequences across 25 species for $165. Models capture biological patterns and enable secondary structure and gene expression predictions without manual annotation.
Hugging Face releases TRL v1.0, a post-training library for language model fine-tuning. Version 1.0 marks API stability and includes support for DPO, PPO, and optimizations for models like Llama and Mistral.
Hugging Face releases OpenClaw, an open-source tool to unlock robotic manipulation capabilities. The project aims to democratize access to gripper and robotic arm control technologies through a collaborative platform.
Hugging Face introduces EVA, a new framework for evaluating voice agents. The framework establishes standardized metrics to measure performance, robustness, and utility of voice dialogue systems in production.
Hugging Face releases a guide to build a domain-specific embedding model in under a day. The method uses efficient fine-tuning techniques on domain-specific data to create optimized embeddings without expensive infrastructure.
Hugging Face releases its biannual report on the open source ecosystem. The report covers trends in models, datasets, and tools available on the platform in spring 2026, with metrics on growth and adoption.
Hugging Face releases Holotron-12B, a 12-billion-parameter AI agent designed for high-throughput computer automation tasks. The model combines vision and action capabilities to interact with user interfaces.
Hugging Face introduces Storage Buckets on its Hub, enabling users to store and manage large files directly on the platform. This feature simplifies sharing of datasets, models, and artifacts without relying on external services.
Hugging Face analyzes 16 open-source reinforcement learning libraries to identify best practices in token management and data flow optimization. The study documents implementation patterns, scalability challenges, and recommendations for maintaining RL system efficiency in production.
LeRobot v0.5.0 scales across all dimensions: larger models, expanded datasets, multi-robot support, and training optimizations. The open-source robotics platform reaches production maturity with standardized benchmarks and modern architecture integration.