Microsoft data suggests using AI is more expensive than hiring people
Microsoft internal data suggests AI usage for certain tasks costs more than hiring human workers. The article raises questions about the actual ROI of enterprise AI deployments.
3147 articles
Microsoft internal data suggests AI usage for certain tasks costs more than hiring human workers. The article raises questions about the actual ROI of enterprise AI deployments.
StepFun releases Step 3.7 Flash, a 196B/11B active MoE multimodal model with built-in 1.8B ViT. SWE-Bench Pro: 56.26% (beats DeepSeek V4 Flash 55.6%), DeepSearchQA F1: 92.82%. Runs locally on 128GB RAM.
An anonymous LLM model called Hy3 is topping OpenRouter's rankings by a large margin. Its identity and technical details remain unknown, raising questions about its origin and actual capabilities.
Lance model optimization for RTX 2080 Ti 22GB on single and dual-GPU setups. Custom operator configurations for Turing architecture, pipeline/tensor parallelism across 44GB combined VRAM, reproducible open-source scripts.
Vercel Sandbox now supports installing and running Docker inside sandboxes without touching the host system. Enables testing containerized services like Redis/Postgres, validating container images before deployment, and previewing containerized applications. Also adds FUSE filesystem drivers and VPN client support.
Vercel Sandboxes now allows binding port 8080 to an ingress domain. The controller port has been moved to port 23456.
OpenAI publishes guidance for third-party AI evaluations, covering assessment of model capabilities, safeguards, and validity for frontier systems.
Beginner's guide to PyTorch profiling using torch.profiler. Covers how to measure performance and identify bottlenecks in AI models, with practical examples for newcomers.
Anthropic releases Claude Opus 4.8, described as a "modest but tangible improvement" over 4.7. The model excels in honesty: 4x less likely to let code flaws pass unremarked, and abstains more on uncertain questions. Pricing unchanged: $5/M input tokens, $25/M output.
Release of llm-anthropic 0.25.1: adds Claude Opus 4.8 model, -o fast 1 option for fast mode (enabled organizations), and default max_tokens now matches each model's maximum output instead of 8192.
Reddit user shares method to run SearXNG (decentralized search engine) on Windows without Docker or WSL. Practical approach for local deployment.
ByteDance is developing custom Arm and RISC-V processors to reduce inference costs for its models. The group processes 120 trillion tokens/day with Doubao and aims to reduce Nvidia GPU dependency by optimizing server infrastructure.
Anthropic granted operational access to Claude via Project Glasswing to the US Federal Reserve and Bank of England, but no EU institution has such access based on available information.
Streamlit app to benchmark yourself against open-source LLMs across 5 benchmarks. Share results on CV/LinkedIn. BBQ benchmark featured.
Starbucks incorporates AI usage into bonus evaluations for tech workers. The coffee chain adjusts compensation policy to reward AI tool adoption among IT staff.
Claude CLI >= 2.1.154 introduces "ctx", "msg", and "system" roles for API messages, breaking vLLM compatibility. A one-line patch in vLLM restores compatibility and enables Claude workflows with local models like MiniMax-M2.7.
Call for papers for 2nd Workshop on Social Simulation with LLMs (Social Sim'26) @ COLM 2026. Theme: "Fidelity in Applications". Deadline June 23, 2026. Focus on evaluation, robustness, interpretability, and empirical validation of LLM-based simulated societies.
Anthropic raises $65 billion in Series H at a $965 billion valuation. Annualized revenue reaches $47 billion according to CFO Krishna Rao. The company will invest in safety research, computing capacity, and expanding its Claude product lineup.
Anthropic releases Claude Opus 4.8, outperforming GPT-5.5 and Gemini 3.1 Pro on most benchmarks. The model catches its own coding errors 4× better than its predecessor. Anthropic also rolls out dynamic workflows enabling hundreds of parallel sub-agents for codebase-wide migrations.
Amazon discontinues internal AI leaderboard to prevent employees from optimizing for usage metrics instead of quality. The platform was driving a metrics-chasing behavior that diverted teams from actual business goals.
Anthropic releases Claude Opus 4.8 on May 28, 2026. The model is reportedly four times less prone to errors, with emphasis on honesty regarding its own failures.
Open Envelope is an open schema for defining AI agent teams. The project proposes a standardized specification to compose and orchestrate multiple agents in collaborative workflows.
Mimo 2.5 Pro achieves 40 t/s on 8x Nvidia GB10 cluster with 1k context, degrading to 17 t/s at 250k context. Parallelization: 60 t/s (2 requests), 83 t/s (4 requests). 1T model optimized via mtp-2.
Official Compound Engineering plugin for Claude Code, Codex, Cursor and other editors. Native integration to enhance development workflows.
Markdown rendering tool with specialized support for fenced code SVG blocks. Renders the image and provides a tab to switch to code view. Accepts raw Markdown, CORS-enabled URLs, or Gists.
Sam Altman and Dario Amodei (OpenAI and Anthropic) are walking back previous predictions of AI-driven job apocalypse. Both executives are moderating their rhetoric on massive AI impact on employment.
RAG (Retrieval-Augmented Generation) enhances language models by providing access to external data, reducing hallucinations and errors. This approach combines document retrieval and generation to optimize answer relevance.
VeritasReason is an open-source Python framework adding structured reasoning and provenance layer to AI agents. It provides queryable context graphs, forward-chaining rule engine (YAML), W3C PROV-O provenance, and policy compliance checking. Works with OpenAI, Anthropic, Groq, Ollama.
Mistral launches Vibe, a unified AI capable of handling meetings, documents, and code in a single interface. The product aims to eliminate the need to switch between multiple specialized tools.
Cognition and OpenInspect showcase async agents as emerging paradigm: 80% of Devin commits from agents, automated spec-to-PR workflows, full VMs, persistent agent memory, and PMs shipping code directly.
Open-source tool to bootstrap a team of coding agents from a template. Project shared on Hacker News with limited engagement (3 points, 0 comments).
Google is developing autonomous AI agents with Remy and Gemini Spark, replacing passive chatbots with tools capable of independent actions and increased productivity.
Anthropic raises $65 billion in Series H funding at $965 billion post-money valuation. This major funding round reflects continued investor confidence in the company's AI development trajectory.
A r/LocalLLaMA user questions IBM's decision to return to pure transformer architecture for Granite 4.1, abandoning Granite 4's hybrid mamba-attention design. On modest hardware (8GB VRAM), Granite 4 delivered 128k context at ~1000 tok/s ingestion, while Granite 4.1 caps at 14k context and ~300 tok/s. User asks whether IBM will continue offering mamba architecture.
AgingBench, a new longitudinal deployment benchmark, shows that swapping Claude Sonnet 4.6 for Opus 4.7 in the Claude Code CLI agent drops PyTest pass rate by ~15%. Memory policy alone drives a 4.5x spread in agent half-life across scenarios, larger than any model swap tested.
Claude Code now supports dynamic workflows, enabling users to create adaptive task sequences. The feature enhances automation and flexibility in AI-assisted coding processes.
Wall-OSS-0.5 is a 4B VLA from X Square Robot with open training code. Zero-shot evaluation on 17 real-robot tasks: 4 tasks >80% progress, including Rope Tightening (82%). Post fine-tuning: 60.5% average task progress (+17.5pp vs pi0.5). Mixture-of-Transformers architecture with vision-aligned RVQ tokenizer and distributed DMuon optimizer.
LiquidAI releases LFM2.5-8B-A1B, a hybrid 8B model optimized for on-device inference (CPU/GPU). Extended architecture with reinforcement learning, compatible with llama.cpp/MLX/vLLM/SGLang. Performance competitive with larger models on agentic tasks and complex instruction following.
Qualcomm announces a Windows on ARM laptop at $300, positioning an affordable alternative to traditional laptops. The manufacturer promises a functional device for essential use cases.
Comparative test of Qwen 3.6 35B across output formats (raw text, markdown, HTML, HTML+CSS). Markdown achieves best quality (78/100 per ChatGPT-4o) with 1,496 output tokens in 23s. HTML+CSS generates 10,290 tokens in 82s but lower quality score (58/100). Measurements include reasoning tokens, throughput, and total time.
Google Cloud launches 'AI Threat Defense', a platform automating detection, assessment, and patching of security flaws in enterprise systems. It integrates technologies from acquisitions.
Writ is an enforcement layer for AI coding agents using a local Neo4j knowledge graph and hybrid RAG. A 5-stage retrieval pipeline (BM25, HNSW vector similarity, graph traversal, reciprocal rank fusion) surfaces only relevant rules per task. 30 bash hook scripts enforce execution: no code without approved plan, mandatory tests, static analysis required.
Ktx is an open-source executable context layer for data agents. Enables agents to access and manipulate data in real-time through a standardized interface.
Comparison of Reddit data collection options for ML projects. Official API (100 req/min, 500-comment truncation) inadequate. Pushshift defunct. Author recommends Sylvia: 480 free req/min, $0.0005/request thereafter, full recursive comment resolution, historical archive access.
Google unveils Coral Board at Google I/O, a compact single-board computer designed to run Gemma 3 locally on-device.
User has been fine-tuning Jina-v5 on Slovak legal corpus for a month without success. Model fails to capture Slovak syntactic nuances, especially on ambiguous cases ("krádež" vs "prepadnutie"). Tested multiple approaches: LLM-generated queries, similar chunk injection, logit mining with Qwen 3.5-397B, but fine-tunes consistently underperform base model.
Tomesphere: Chrome extension + website indexing 3M arxiv papers with LLM-curated summaries, OpenReview reviews, GitHub repos, HuggingFace models, citation graphs and SPECTER2 semantic neighbors. Free, no signup.
Sigilant-sweep, an open-source CLI for llama.cpp and vLLM, benchmarks 16 configurations (quantizations, KV cache, context). On Qwen2.5-7B, Q4_K_M beats Q8_0 by 230ms TTFT and +10.7 TPS. Tool measures TPS, TTFT, PPL with p50/p95 and weighted scoring (latency/quality/balanced).
Hugging Face has developed a fully local experience for Reachy Mini, a conversational robot. A blog post details setup and customization for various use cases, including building voice agents without cloud dependency.
YouTube is testing an AI-powered feature that lets users create a fully personalized video feed by dictating their preferences. The system generates recommendations based on user voice or text instructions.
Zai replaced the network architecture on a 1000-GPU cluster running GLM-5.1 from ROFT to ZCube (developed with Tsinghua and HarnetsAI). Results: switch/optical costs down 33%, GPU throughput up 15%, P99 first-token latency down 40.6%. ZCube removes the Spine layer for full bipartite interconnect, eliminating asymmetric traffic hotspots inherent to Prefill-Decode disaggregated inference.
Tomesphere: Chrome extension + web layer enriching arxiv with LLM-curated TLDRs, OpenReview reviews, GitHub/HuggingFace links, citation graphs, SPECTER2 semantic neighbors. 3M papers indexed, free, no signup.
MONET, an Apache 2.0 dataset of 104.9M high-quality images with captions and metadata, released on Hugging Face. Built from 2.9B images and refined. Includes paper, UMAP visualization, text/image retrieval tool, and codebase for training T2I models.
User reports low draft acceptance (40-60%) with Qwen3.5-122B and Qwen3.6-27B in speculative decoding via llama.cpp, versus ~80% expected. Detailed configuration provided with MTP draft, Q6_K_L quantization, batch 2048.
Hugging Face adds a "Base only" toggle on its models page to filter base models and exclude fine-tunes and quantizations. Long-requested feature by the community.
Distributed checkpoint storage system on Raspberry Pi 4B cluster (4× workers + Mac mini M4 coordinator). Handles 942 MB checkpoints in safetensors format with automatic replication, mDNS discovery, and Prometheus/Grafana/Loki monitoring. Addresses non-atomic writes, SD card backpressure, and silent corruption bugs.
Mistral rebrands Le Chat as Vibe and integrates it into a multiplatform work agent. Work Mode connects to Google Workspace, Outlook, Slack and GitHub to handle emails, reports and pull requests. Pro subscription drops from €17.99 to €14.99. Mistral positions itself against agent offerings from OpenAI, Google and Anthropic.
PaddleOCR-VL 1.6 is an update to PaddlePaddle's multimodal optical character recognition system. Improves vision and text processing capabilities for image-based content.
Endava uses Codex to build an agentic organization, accelerating software delivery and reducing requirements analysis from weeks to hours.
Crawl4AI is an open-source web crawler and scraper optimized for LLM integration. The project is trending on GitHub.
Harness is a framework that designs domain-specific agent teams, defines specialized agents, and generates the skills they use.
MOSS-TTS is an open-source speech and sound generation model family from MOSI.AI and OpenMOSS. It covers stable long-form speech, multi-speaker dialogue, voice design, sound effects, and real-time streaming TTS.
AionUi is a free, local, open-source app compatible with Claude Code, Hermes Agent, Gemini CLI and 20+ other CLIs. Enables customization of AI assistants.
Mastra is a TypeScript framework for building AI-powered applications and agents, created by the team behind Gatsby. Available as open-source on GitHub.
Sync-in Server is a secure, open-source platform for file storage, sharing, collaboration, and file syncing.
Firecrawl is an open-source tool to search, scrape, and clean web data for AI agents. It automates web scraping and content preparation for model training or inference.
Official Compound Engineering plugin for Claude Code, Codex, Cursor and other editors. Native integration to enhance development workflows.
MCP server for MetaTrader enabling LLMs to execute trades on the MetaTrader platform. Direct integration between AI agents and financial markets.
Robin is an AI-powered OSINT tool for dark web exploration. Available on GitHub, it automates data collection and analysis on illicit marketplaces.
Microsoft releases RAMPART, a pytest-native safety and security testing framework for agentic AI applications. Enables evaluation of security and safety risks in multi-agent systems.
OmniParse: open-source tool to ingest, parse, and optimize any data format (documents, multimedia) for enhanced compatibility with GenAI frameworks.
Claude Code is an agentic coding tool in the terminal that understands your codebase and executes routine tasks, explains complex code, and handles git workflows through natural language commands.
MOSS-TTS is an open-source speech and sound generation model family from MOSI.AI and OpenMOSS team. It covers stable long-form speech, multi-speaker dialogue, voice/character design, environmental sound effects, and real-time streaming TTS.
Qwen3.6-35B-A3B-APEX quantized by mudler achieves 37 t/s generation with 72K filled context on RTX 3060 12GB via 17.3GB offloading. Spiritbuun's CUDA optimizations (fused MMA, TurboQuant, fattn) + APEX I-Compact quantization yield PPL 3.25. 128K context supported, degrades to 28 t/s @129K.
AMD restricts free Vivado FPGA design tool access on Linux, forcing users to paid licenses or open-source alternatives. The licensing change removes previously available free tier for Linux users.
US corporations face unexpected AI costs. Spending on infrastructure, tokens, and cloud services exceeds initial budgets, forcing organizations to reconsider deployment strategies.
Xreal launches AR glasses A01 at $299, aiming to make AR technology more accessible. The model seeks to democratize augmented reality glasses against traditionally high market prices.
Meta rolls out paid add-ons for Instagram, Facebook, and WhatsApp globally while building a separate paid AI offering. Zuckerberg finally monetizes massive AI spending.
Krasis v1.0, LLM runtime for models exceeding VRAM, achieves 12.48 tokens/s on RTX 3070 Mobile 8GB with Qwen3.6-35B-A3B (Q4). Full Rust implementation (no Python in hot path) and separate prefill/decode optimizations. Benchmarks: 222 pp, 12.48 tg on laptop; 10,030 pp, 124.9 tg on RTX 5090 32GB.
Removerized is an AI image toolkit running fully in the browser. Free, private, and offline-first with no server dependency.
Ofelia, a Grenoble-based SME specializing in business process management, exemplifies a new AI orchestration paradigm in enterprises. The article examines how to structure AI system integration within organizational workflows.
Amazon MGM Studios and AWS launch a creators' fund and in-house AI platform called 'Project Nara'. Three animated series are in production with five-week timelines for pilots. Amazon claims the only end-to-end AI content ecosystem in the industry.
Qwen releases Q-Judger, a vision-language model based on Qwen3.6-27B for automated evaluation of AI-generated images. The model assesses 5 dimensions (quality, aesthetics, alignment, real-world fidelity, creative generation) using chain-of-thought reasoning and outputs structured JSON scores.
ElevenLabs releases Music v2, an AI music generation model enabling seamless genre transitions (opera, heavy metal, rap) within single compositions. New inpainting feature allows regenerating specific sections independently.
Cognition raises $1B in Series D at $26B valuation. The company behind Devin, an AI coding agent, positions code as an uncapped TAM market.
Discussion on llama.cpp optimizations for long context: comparison of MTP (Multi-Token Prediction), KV cache quantization, and performance. User reports 60 tokens/s with long context on 3090, degradation to 20 tokens/s when cache fills. Qwen 27B Q4 tested.
Claude Opus 4.8 is now available on Vercel AI Gateway. The model excels at long-horizon agentic execution and complex multi-step coding tasks. AI Gateway provides unified API access with usage tracking, performance optimizations, and transparent pricing with no markup.
Nvidia releases LocateAnything, a 3B vision-language grounding model. Uses parallel box decoding, 10x faster than Qwen3-VL. Code and demo available on HuggingFace.
Researchers develop a 'Eureka' machine that discovers physical laws through autonomous exploration, mimicking natural processes. The system outperforms current AI exploration capabilities by generating equations and strategies without direct human supervision.
AI data centers consume growing amounts of electricity, threatening power grid stability. Existing infrastructure struggles to provide the power required by large-scale models and massive deployments.
The frontier reasoning race intensifies: Hy3 preview scores 87.8 on CHSBO 2025, outpacing Gemini 3.1Pro and GPT5.4 xhigh. Users question whether these gains reflect real improvements in coding/math or benchmark overfitting.
CapCut launches Design Studio 2.0, an AI-powered platform for graphic creation that replaces traditional templates. The tool offers automated artistic direction for visual design.
Heterogeneous GPU load balancing optimization for Ollama (RTX 5090 + 3090). Custom implementation weights layer distribution by compute power (SMCount × ClockMHz) instead of free memory alone. Result: faster than RTX 5090 standalone, leverages 3090 VRAM without bottlenecking the 5090.
Comparative study of local explainability techniques (LIME, SHAP, Feature Ablation) reliability across 32 tabular datasets. Results show explanation quality does not systematically correlate with model predictive performance, but depends instead on dataset complexity and feature distributions.
Method to translate soft prompts into natural language prompts using a dedicated translation model. Translations outperform InSPEcT across multiple benchmarks. Application: soft prompts optimized on small open-source models convert to portable text prompts that exceed original performance when deployed on closed-API models.
arXiv study on privacy in multi-agent systems. Platform simulates thousands of LLM agents interacting over one month. Privacy violations increase from 19.95% (single-turn) to 45.30% (multi-turn). Agents 8× more likely to disclose sensitive info after observing peer behavior. Explicit privacy instructions reduce but don't eliminate leakage (37.8% minimum).
Debate between models improves weak judge oversight: critic must exceed judge's classification ability for debate to help. On 5 pairings tested on code/logic tasks, 3 show statistically significant gains. Single critique suffices; rebuttal rounds add nothing. Pre-deployment audit proposed.
Small Language Models (SLMs) hallucinate more than LLMs but can solve multi-step questions by inverting the standard strategy: answer first (System-I), then reason deeply (System-II) with evidence retrieval. Initial hallucinations help refine the final answer.
Study on gender preservation in English-to-Hindi translation. Benchmark of 37,345 instances shows GPT-4o-mini and Sarvam frequently erase gender via ergative constructions. Two rerankers (SAR and PAR) improve gender recoverability: PAR increases accuracy from 11-16% to 49-54%, but reduces fluency (4.36→3.37). Reveals preservation-fluency tradeoff.
GraD-IBD reformulates longitudinal ICD trajectories as temporally directed graphs to detect inflammatory bowel disease risk early. A context-aware time-decay message passing mechanism captures temporal dependencies with reduced complexity. Robust results on real-world clinical data.