Page 81 of 149

AllHigh signalRecent

5948 articles

Cursor's Composer 2.5 matches Opus 4.7 and GPT-5.5 benchmarks at a fraction of the cost

Cursor releases Composer 2.5, a coding model built on Kimi K2.5 and trained on 25x more synthetic tasks than its predecessor. It matches Opus 4.7 and GPT-5.5 benchmark performance at a fraction of the cost.

Code generation Benchmarks Kimi

SIG

HYP

Hugging Face Blog·May 18

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

Hugging Face releases a guide for fine-tuning NVIDIA Cosmos Predict 2.5, a robot video generation model, using LoRA/DoRA. The method reduces GPU resource requirements while maintaining generation quality for specialized robotics use cases.

Fine-tuning Video generation Robotics

SIG

HYP

Reddit r/MachineLearning·May 18

Scaling LLMs horizontally: hidden-state coupling without weight modification [R]

Residual Coupling (RC) connects frozen language models in parallel via lightweight learned linear projections, without weight modification. Linear bridges read hidden states from one model and inject additive updates into another's residual stream. On medical data, RC reduces perplexity to 11.02 vs 56.80 for MoE (+80.7%), and improves TruthfulQA by 9.1 percentage points.

Llama Multi-agent Fine-tuning

SIG

HYP

Reddit r/LocalLLaMA·May 18

I tested 42 LLMs on their willingness to build the apocalypse. The "safest" closed-source models are lying to you.

DystopiaBench tests 42 LLMs (open and closed-source) on their ability to refuse progressively normalized dangerous requests. 6 dystopia categories (autonomous weapons, surveillance, behavioral control, etc.) with 5 escalation levels. Finding: models detect obvious harmful requests but fail against requests hidden behind dual-use and normalization. Open-source benchmark available.

Benchmarks AI safety Alignment

SIG

HYP

Reddit r/LocalLLaMA·May 18

Qwen 3.6 27B on 24GB VRAM setup: backend comparisons, quant choice and settings (llama.cpp, ik_llama.cpp, BeeLlama, vllm)

Detailed benchmark of Qwen 3.6 27B on RTX 3090 24GB. ik_llama.cpp outperforms llama.cpp and BeeLlama with 1261 tok/s prefill and 72.9 tok/s decode on 156k context. Optimal setup: IQ4_KS quantization, multi-token prediction, flash attention.

Qwen Code generation Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 18

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

SmallCode, a local coding agent, achieves 87% on benchmarks with Gemma 4B using compound tools, iterative improvement loops, and optimized context management. Unlike existing agents (OpenCode, Cursor, Claude Code) requiring large models, SmallCode is designed for small local models with optional escalation to Claude/OpenAI.

AI Agents Code generation Open source

SIG

HYP

Simon Willison·May 15

datasette-llm-limits 0.1a0

Release of datasette-llm-limits 0.1a0, a plugin for Datasette enabling per-user or global spending limits for LLM usage. Supports daily limits with rolling windows and USD amounts.

Tools Open source

SIG

HYP

OpenAI Blog·May 15

A new personal finance experience in ChatGPT

OpenAI launches personal finance feature in ChatGPT Pro (US market). Users can securely connect bank accounts to receive AI-powered insights and guidance tailored to their financial context, goals, and priorities.

OpenAI Business

SIG

HYP

Vercel AI Blog·May 15

Sort providers by cost, latency, or throughput on AI Gateway

Vercel AI Gateway now enables sorting providers by cost, time to first token (TTFT), or throughput (TPS). Sorting happens at request time, automatically reflecting price changes and performance shifts without code changes. Compatible with Zero Data Retention and existing routing options.

Tools Infrastructure Business

SIG

HYP

Latent Space·May 14

AI-Native Healthcare: 100M Doctor Visits, 10–20 Hours Saved, Prior Auth in Minutes — Janie Lee & Chai Asawa, Abridge

Abridge converts patient-clinician conversations into healthcare's operating system. Platform processes 100M doctor visits, saves 10-20 hours per clinician, and reduces prior authorization to minutes using AI.

AI Agents Voice Business

SIG

HYP

Simon Willison·May 14

datasette-ip-rate-limit 0.1a0

Release of datasette-ip-rate-limit 0.1a0, a configurable rate-limiting plugin for Datasette. Built with Codex (GPT-5.5 xhigh) to block aggressive crawlers. Production config on datasette.io with per-path rules (60 requests/60s, 20s block).

Tools Open source Code generation

SIG

HYP

OpenAI Blog·May 13

Our response to the TanStack npm supply chain attack

OpenAI details its response to the TanStack "Mini Shai-Hulud" supply chain attack, outlines protections for systems and signing certificates, and mandates macOS app updates by June 12, 2026. Incident affecting software security with strengthened defenses against evolving supply chain threats.

OpenAI AI safety

SIG

HYP

Simon Willison·May 12

datasette 1.0a29

Datasette 1.0a29 adds TokenRestrictions.abbreviated() utility method, improves table header visibility for empty tables, fixes Mobile Safari column actions dialog bug, and resolves a race condition between Datasette.close() and Database.close() causing segfaults.

Open source Tools Infrastructure

SIG

HYP

Vercel AI Blog·May 12

Create Vercel Firewall rules with natural language

Vercel Firewall now enables creating custom WAF rules using natural language. Users describe desired behavior and the dashboard generates the rule. Available via dashboard or Vercel CLI.

Tools Infrastructure

SIG

HYP

Vercel AI Blog·May 12

Fast mode for Opus 4.7 available on AI Gateway

Vercel AI Gateway launches Fast Mode for Claude Opus 4.7 in research preview. Output token generation is ~2.5x faster while maintaining full Opus 4.7 intelligence. Pricing: 6x standard rates (input $30/1M, output $150/1M tokens).

Claude Claude Code Infrastructure

SIG

HYP

OpenAI Blog·May 12

What Parameter Golf taught us about AI-assisted research

OpenAI's Parameter Golf competition gathered 1,000+ participants and 2,000+ submissions to explore AI-assisted ML research, coding agents, quantization, and novel model design under strict constraints. The initiative demonstrates how AI tools accelerate research innovation.

OpenAI AI Agents Benchmarks

SIG

HYP

OpenAI Blog·May 11

OpenAI launches DeployCo to help businesses build around intelligence

OpenAI launches DeployCo, a new enterprise deployment company to help organizations bring frontier AI models into production and generate measurable business impact.

OpenAI Business AI Agents

SIG

HYP

Vercel AI Blog·May 11

Vercel Sandbox firewall now supports request proxying and filtering

Vercel Sandbox firewall now supports forwarding HTTP requests to a controlled proxy, with filtering via matchers and credentials brokering. Available in beta for Pro and Enterprise plans via @vercel/sandbox SDK.

Infrastructure Tools

SIG

HYP

Vercel AI Blog·May 10

How Superset built the IDE for AI agents on Vercel

Superset, an IDE for multi-agent development founded by former CTOs, enables directing up to 10 coding agents in parallel on Vercel. Each agent operates in an isolated environment with its own branch and live URL, eliminating serialization bottlenecks of traditional CI pipelines.

AI Agents Multi-agent Code generation

SIG

HYP

OpenAI Blog·May 8

Running Codex safely at OpenAI

OpenAI outlines security measures for Codex: sandboxing, approvals, network policies, and agent-native telemetry. Framework designed to enable safe production deployment of coding agents.

OpenAI AI Agents AI safety

SIG

HYP

Latent Space·May 8

[AINews] GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs

OpenAI releases three new voice APIs: GPT-Realtime-2 for real-time conversation, GPT-Translate for instant translation, and GPT-Whisper for transcription. These tools aim to set new standards in voice processing.

OpenAI GPT Voice

SIG

HYP

OpenAI Blog·May 7

Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber

OpenAI launches GPT-5.5 and GPT-5.5-Cyber through Trusted Access for Cyber program, enabling verified security researchers to accelerate vulnerability research and protect critical infrastructure.

GPT OpenAI AI safety

SIG

HYP

OpenAI Blog·May 7

Advancing voice intelligence with new models in the API

OpenAI releases new realtime voice models in its API with reasoning, translation, and transcription capabilities for more natural and intelligent voice experiences.

OpenAI Voice

SIG

HYP

Vercel AI Blog·May 7

Vercel Flags now supports JSON values

Vercel Flags now supports JSON values alongside booleans, strings, and numbers. Enables managing a single flag to test AI models (temperature, max_tokens) instead of multiple separate flags. Useful for progressive traffic routing or quick model switching.

Tools Infrastructure

SIG

HYP

OpenAI Blog·May 7

Testing ads in ChatGPT

OpenAI begins testing ads in ChatGPT to fund free access, with clear labeling, answer independence, strong privacy protections, and user control over ad preferences.

OpenAI Business

SIG

HYP

OpenAI Blog·May 7

Introducing Trusted Contact in ChatGPT

OpenAI launches Trusted Contact in ChatGPT: optional safety feature that alerts a trusted contact if serious self-harm concerns are detected. Gradual rollout, user consent required.

AI safety OpenAI

SIG

HYP

Vercel AI Blog·May 6

Secure Marketplace credentials with Production-only access

Vercel adds access control for marketplace integration credentials: Production-only restriction hides sensitive variables from dashboard and CLI, blocks non-production connections, and requires Owner permissions to revert.

Tools Infrastructure AI safety

SIG

HYP

OpenAI Blog·May 5

Unlocking large scale AI training networks with MRC (Multipath Reliable Connection)

OpenAI releases MRC (Multipath Reliable Connection), a supercomputer networking protocol designed to improve resilience and performance in large-scale AI training clusters. Available via OCP (Open Compute Project).

OpenAI Infrastructure Open source

SIG

HYP

OpenAI Blog·May 5

New ways to buy ChatGPT ads

OpenAI launches a beta self-serve Ads Manager for ChatGPT with CPC bidding and enhanced measurement tools, designed to keep user conversations separate from ad data while protecting privacy.

OpenAI Business

SIG

HYP

Vercel AI Blog·May 1

Postgres connections now work through Sandbox firewall

Vercel Sandbox now supports connections to hosted Postgres databases (Neon, Supabase, AWS RDS, Nile, Prisma). The firewall detects Postgres startup sequence, waits for TLS upgrade, then applies domain policies. No code changes needed.

Infrastructure Tools

SIG

HYP

Vercel AI Blog·Apr 30

Grok 4.3 on AI Gateway

Grok 4.3 is now available on Vercel AI Gateway with a 1M token context window and improvements in accuracy, tool calling, and instruction following. The gateway provides unified API access with usage tracking, intelligent retries, and automatic provider routing.

Tools Infrastructure

SIG

HYP

Vercel AI Blog·Apr 30

Custom tags available in beta on Vercel Sandbox

Vercel Sandbox launches custom tags in beta to organize isolated environments. Each sandbox supports up to 5 tags, enabling filtering by environment, team, or customer, promoting from staging to production, and tracking usage for multi-tenant billing attribution.

AI Agents Code generation Tools

SIG

HYP

OpenAI Blog·Apr 28

OpenAI models, Codex, and Managed Agents come to AWS

OpenAI makes GPT models, Codex, and Managed Agents available on AWS, enabling enterprises to deploy AI securely within their AWS environments.

OpenAI GPT AI Agents

SIG

HYP

OpenAI Blog·Apr 27

An open-source spec for orchestration: Symphony

OpenAI releases Symphony, an open-source spec for orchestrating Codex agents through issue trackers, turning them into always-on autonomous systems. Reduces context switching and boosts engineering productivity.

OpenAI AI Agents Code generation

SIG

HYP

OpenAI Blog·Apr 23

GPT-5.5 Bio Bug Bounty

OpenAI launches a bug bounty for GPT-5.5 to identify universal jailbreaks for biosafety risks, with rewards up to $25,000. Red-teaming initiative to strengthen security before deployment.

GPT OpenAI AI safety

SIG

HYP

OpenAI Blog·Apr 22

Making ChatGPT better for clinicians

OpenAI makes ChatGPT for Clinicians free for verified U.S. physicians, nurse practitioners, and pharmacists. The tool supports clinical documentation, patient care, and medical research.

Claude OpenAI Business

SIG

HYP

Google DeepMind·Apr 22

Decoupled DiLoCo: A new frontier for resilient, distributed AI training

Google DeepMind introduces Decoupled DiLoCo, a distributed training method that improves resilience and efficiency for large-scale AI model training. The technique decouples local and global updates to reduce latency and network failures.

DeepMind Infrastructure Reinforcement learning

SIG

HYP

OpenAI Blog·Apr 22

Speeding up agentic workflows with WebSockets in the Responses API

OpenAI optimizes agentic workflows using WebSockets in the Responses API: reduced model latency and API overhead through connection-scoped caching. Improvements documented on the Codex agent loop.

OpenAI AI Agents Infrastructure

SIG

HYP

OpenAI Blog·Apr 22

Introducing workspace agents in ChatGPT

OpenAI launches workspace agents in ChatGPT: Codex-powered agents that automate complex workflows, run in the cloud, and enable secure team collaboration across tools.

OpenAI AI Agents Code generation

SIG

HYP

Hugging Face Blog·Apr 21

QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard

Hugging Face launches QIMMA, a quality-focused Arabic LLM leaderboard. The platform evaluates Arabic language models against rigorous criteria, providing a transparent benchmark for Arabic language performance.

Benchmarks Open source Evals

SIG

HYP