datasette 1.0a32
Datasette 1.0a32 fixes a bug with INSERT ... RETURNING queries via the new /db/-/execute-write endpoint and multiple base_url issues found during Service Worker experiments.
3146 articles
Datasette 1.0a32 fixes a bug with INSERT ... RETURNING queries via the new /db/-/execute-write endpoint and multiple base_url issues found during Service Worker experiments.
Research on semantic step prediction in LLM reasoning trajectories. Multi-step latent forecasting method via step sampling to improve language model reasoning performance.
Workshop on Unlearning and Model Editing (U&ME) at ECCV 2026. Platform to discuss techniques for modifying or removing specific knowledge from AI models without full retraining.
G7 reaches agreement on shared terminology distinguishing open-source AI from open-weights AI. Governments formalize definitions already understood by the technical community.
Developer trained GPT-1 (1B parameters) on RTX 2060 Super 8GB in 1 hour. Demonstrates that gamers can now pre-train specialized <1B models locally without cloud infrastructure. Code and model released on GitHub and HuggingFace.
NVIDIA Parakeet speech-to-text ported to C++/ggml without Python or PyTorch. Byte-for-byte identical output to NeMo, up to 5x faster on GPU for larger models, 600x realtime on audio clips. Quantized GGUFs (f16, q8_0, q6_k, q5_k, q4_k), flat C API, integrated in LocalAI with OpenAI-compatible endpoint.
A ChatGPT extension for Google Sheets exfiltrates workbook data without explicit consent. Users believe they interact with OpenAI while the extension accesses entire spreadsheet contents.
User trained GPT-1 on RTX 2060 Super (8 GB VRAM) in ~1 hour using Claude-generated code based on original implementation. Cost to reproduce GPT models dropped 500–1000× since GPT-2 ($43,000 → $48 per H100 cluster run).
Technical discussion on VRAM overflow mechanics in llama.cpp. User runs Gemma-4 26B (21GB) on RX6600XT + Ryzen 7 5700X with 32GB RAM, achieving ~20 tokens/s decode. Question: how is CPU/GPU split handled and what role do PCIe speed vs CPU play?
University of Chicago researchers created a tool to detect AI-generated songs. The tool analyzes audio characteristics to identify typical signatures of synthetic generation.
Llama Studio v0.2.0 replaces JSON model config with per-model shell scripts, adds GPU splitting with tensor-split detection, and introduces session store with autoload on startup. Open-source WebUI for managing llama-server instances.
Netflix Wiz created an app to reduce AI infrastructure costs and open sourced it. The tool helps organizations optimize their AI spending.
Benchmark on 9070XT GPU: Qwen 35B A3B MTP achieves 43.74 T/s vs 38.07 T/s standard mode. MTP shows ~15% throughput gain despite multi-token prediction overhead. Identical test conditions (prompt, 8192 context, Q4_K_XL quantization).
Study on the real operational impact of LLM use in production. Analyzes measurable costs, latencies, and productivity gains versus marketing claims.
Connecticut government signed a law requiring employers to notify employees before using AI for employment decisions. The measure aims to increase transparency and worker rights regarding AI systems.
Ouijit is an open-source task and terminal manager for coding agents. Enables management of AI agent execution in development environments.
Benchmark on Radeon 7900 XTX: Qwen3.6-35B vs Gemma4-26B with reasoning enabled. Qwen generates 2x more tokens (14,811 vs 7,386) but Gemma is ~20% faster end-to-end (95.6s vs 118.8s). Qwen's MTP reaches 130 tok/s vs 78 tok/s, but token count becomes the bottleneck. Quality close, interesting per-task splits.
PewDiePie released Odysseus, a web UI/harness for local LLMs. The creator, without formal programming background (mechanical engineering studies), provides a non-developer perspective on local model accessibility.
Odysseus is a self-hosted AI workspace. The project offers an open-source alternative to proprietary cloud platforms for running AI models and workflows locally.
CVPR Workshop Radar aggregates CVPR 2026 workshops and tutorials into a searchable web interface. Search by title/organizer/topic, filter by date/type/program availability, personal schedule, timeline view. Automated pipeline: PDF extraction → scraping → LLM processing. Open source, offline-capable, no account required.
User reports adding an RTX 2070 Super (8 GB VRAM) to his high-end rig (RTX 5090, 9800X3D, 96 GB RAM) enables running Qwen 3.6-27B at Q8_0 with 144k context at 40-70 tok/s. Takeaway: more VRAM > raw performance for local inference.
Bonsai Image 4B is a 1-bit quantized image generation model designed to run on local devices. The model compresses weights to 1-bit to drastically reduce size and computational requirements, enabling inference on resource-constrained hardware.
mlx-Chronos is an open-source CLI tool and community leaderboard to compare MLX inference engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama). Measures TTFT, throughput, RAM, and thermal state with standardized methodology. Leaderboard currently populated by M2 8GB, seeking M3/M4 results.
Study examines AI bots' tendency to ignore scientific evidence. Current models fail to systematically follow empirical data, raising concerns about their reliability for scientific research.
Vercel AI Chat SDK adds support for Lark and Feishu via a new official vendor adapter. Bots can post, edit, and delete messages, stream replies via Lark's native cardkit typewriter API, send interactive cards, and react with emojis. Connection uses Lark's WebSocket transport without requiring HTTP webhook exposure.
Systematic comparison of 13 abliterated Gemma 4 E2B variants across 44 GPU hours. coder3101 achieves 96% ASR (refusals) with full capability preservation and outperforms base model on math. Surgical approaches preserve performance better than aggressive methods, with some losing up to 6.9 points on GSM8K.
Developer open-sources AI accelerator on FPGA (AWS F2) based on RocketChip/RISC-V with attention mechanism built into silicon. Benchmarks: 225× speedup vanilla attention, 96× TinyBERT, 50× ViT, 30× GPT-2 prefill. Native BF16 support.
User built a DIY cooling enclosure for 2 DGX Spark units using a 3D-printed Thingiverse design (PETG filament). Added a 120mm fan with automatic temperature control via AC Infinity thermostat controller with temperature probe to adjust fan speed based on cluster heat output.
Hermes WebUI is a web interface to use Hermes Agent from a browser or mobile device. Open-source project trending on GitHub.
Pi-subagents is an extension for async subagent delegation with truncation, artifacts, and session sharing. Open-source project trending on GitHub.
AppFlowy-Cloud is an open-source collaborative workspace with integrated AI, a Notion alternative. Manages projects, wikis, and teams while maintaining data control.
Arnis is a tool that generates real-world locations in Minecraft with high detail. The project uses AI models to convert geographic data into Minecraft structures.
Golem Cloud is an agent-native platform for building AI agents and distributed applications that never lose state, never duplicate work, and never require infrastructure management.
Sandcastle is a TypeScript library to orchestrate sandboxed coding agents. It enables isolated code execution via sandcastle.run().
Pi-subagents is an extension for async subagent delegation with truncation, artifacts, and session sharing. Open-source tool for agent orchestration.
Open-source course on building production agentic RAG systems. Covers architecture, implementation patterns, and best practices for deploying agentic retrieval-augmented generation systems.
ComfyUI is a modular GUI for diffusion models with a node/graph-based interface, providing API and backend capabilities for image generation.
Hermes WebUI provides a web and mobile interface to use Hermes Agent. Open-source project trending on GitHub.
Kaikaku.AI releases Epicure, three AI models separating ingredients by recipe compatibility or chemical similarity. Trained on 4.14 million multilingual recipes and FlavorDB, they generate different recommendations per source. The chemistry-only model outperforms recipe-based variants on taste and nutrition classification without direct data.
Production challenges with diffusion models: handling GPU load spikes, cold starts, and inference costs. Scaling from 100 to 10k requests exposes architectural issues and multi-tenancy problems.
Reddit user reports DeepSeek v4 Pro achieves 8% pass rate on DeepSWE benchmark, contrasting with their perception of near-parity with Claude Sonnet 4.6 in practice. Link to DeepSWE benchmark provided.
Stepfun 3.7 Flash delivers quality close to GLM 5.1 with 80% 3D world understanding while using 75% fewer parameters and featuring built-in vision. Recommended for RAM-constrained setups.
Flash Attention optimization for llama.cpp on RDNA3 GPUs: 47% VRAM reduction vs Vulkan f16. Packs four 8-bit K-values into native sudot4 instructions without lossy quantization. At 128k context with MTP draft: 21.76 GiB vs 23.18 GiB (1.42 GiB savings). Quality preserved: mean KLD 0.00455 (q4_0 V), 97.06% identical top tokens.
A user shares a Tampermonkey script to add a reasoning toggle button in llama.cpp web chat for Qwen 3.6. The script intercepts API requests and controls the enable_thinking parameter without recompiling the source code daily.
Bloc is an open-source package manager for local AI models, agents, and tools. It packages complete setups (model, runtime, dependencies, environment variables) into versioned recipes executable via CLI. Similar to npm for AI workloads, with automatic hardware detection and dependency management.
Anthropic bans AI tools during job interviews to assess candidates' actual thinking. Up to five rounds test skills, values, and ethical reasoning. Salaries reach $850,000. Some applicants pay $4,600 for prep coaching run anonymously by current company employees.
llama.cpp benchmark comparing Windows 11 and Linux (Ubuntu 26.04) on Nvidia GPU (RTX 5080 + 2× RTX 5060 Ti). No significant performance difference: Qwen 3.5 122B achieves PP 300/TG 28 (Windows) vs PP 290/TG 28.5 (Linux); Qwen 3.5 397B: PP 140/TG 16 vs PP 150/TG 15.2. Tests repeated 4 times with recent llama.cpp including VRAM optimization.
PolyRange is a cybersecurity AI benchmark that dynamically generates fresh web targets for each evaluation, eliminating training corpus contamination. The author addresses consensus from labs (Anthropic, OpenAI, DeepMind): static benchmarks are saturated and real-world defenses are missing. MIT-licensed, independent from the author's commercial project.
An Anthropic study finds researchers with typically male names use AI coding agents more than twice as often as those with typically female names, controlling for discipline and career level. Economists lead at 39%, education researchers at 4%. The gender gap for coding agents far exceeds that for general AI use.
SoftBank plans to build AI data centers with 5 GW capacity in France for up to 75 billion euros, its largest AI infrastructure investment in Europe. 45 billion euros of facilities are set to go live by 2031 across three northern France sites.
mlx-Chronos is an open-source CLI tool and community leaderboard to benchmark local LLM inference engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama). Measures TTFT, throughput, RAM, and thermal state with standardized methodology. Currently populated only with M2 8GB results.
AI search agents like GPT-5.4 and Kimi K2.6 mostly confirm their training knowledge rather than genuinely researching the web. Researchers at Harbin Institute of Technology demonstrated this using LiveBrowseComp, a benchmark based on events from the last 90 days. Without relying on training memory, performance collapses.
Novel approach for autonomous AI agents: using memory as action to manage context for long-horizon tasks. The system actively selects which information to retain and use, improving performance across extended horizons.
MiniMax M3, MiniMax's first model with 1M-token context window and native multimodality, is now available on Vercel AI Gateway. M3 excels at software engineering, terminal-based tool use, and agentic web browsing, optimized for multi-turn collaboration.
Developer built an ebook reader with embedded translation model based on llama.cpp. Local application for multilingual readers: AI translation, sticky notes, bookmarks, reviews, searchable annotations. Uses compact models (4B-70B) without cloud dependency.
Komi-learn is a framework for coding agents with continuous memory and self-improvement capabilities. The project enables agents to learn from past experiences and improve performance over time.
Mudler releases APEX GGUF quantizations of Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled with bundled MTP (multi-token prediction) head. Files enable self-speculative decoding via llama.cpp without separate draft model. Size +2.5% vs non-MTP version, MTP head quantized Q8_0 for high draft accuracy.
Dell confirms XPS laptop with NVIDIA N1X GPU (based on DGX Spark GB10 architecture) for consumer market running Windows. Official announcement at Computex.
Anthropic calculates its 'run-rate revenue' in two parts: last 28 days of consumption-based sales × 13, plus monthly subscriptions × 12. This metric, reported by Reuters, raises questions about actual revenue measurement.
User showcases personal data center: 4 systems (Threadripper 3960X + 4×3090 Ti, Xeon 8352 + 4×5070 Ti, Intel 14700K + 5090, Ryzen 5950X + 2×5070 Ti). Runs Qwen 27B for coding, Nemotron for STT, trains TTS LoRA. Agentic systems work overnight on repos with zero token cost.
Benchmark of inference engines on M1 Max 64GB comparing rapid-mlx, omlx, mlx-lm, and ollama with Qwen 3.5-4B. Rapid-mlx leads on speed and memory efficiency. Results submitted to mlx-chronos community leaderboard.
Scammers are using AI-generated images of fake Black people to promote Shein products on social media. Fraudulent marketing practice exploiting image generation and racial bias.
Developer builds custom inference engine in Rust and Metal to eliminate setup friction for local LLMs. One-click app includes model selection, tools, MCP support, and performance optimization. Repository and app launching June 1st, free and open-source.
Starbucks abandons a faulty AI inventory management tool that failed to accurately count stock. The system did not meet operational expectations.
A r/LocalLLaMA user highlights an inversion: the community self-hosts models (hardest part) but outsources tooling (tracing, evals, monitoring) to SaaS. He argues open-source solutions (Langfuse, ragas, Open WebUI) now enable hosting the full stack locally without external calls.
Anthropic publishes detailed documentation on sandboxing techniques across Claude.ai, Claude Code, and Cowork. Uses gVisor (Claude.ai), Seatbelt/Bubblewrap (Claude Code local), and full VMs (Cowork). Includes process sandboxes, filesystem boundaries, and egress controls to prevent credential exfiltration.
TCO analysis of a $6.4k local LLM server with 4x MI100 32GB GPUs and EPYC 48-core CPU. Runs 4 llama.cpp instances with Qwen 3.6 27B on ROCm. Processes 20.4M input tokens and 1.32M output tokens daily. Equivalent API cost: $3,701/year ($308/month). Author emphasizes proper hardware depreciation accounting for realistic TCO.
Simon Willison used Claude Opus 4.8 via Claude Code to implement running Python ASGI apps in the browser via Pyodide and Service Workers. This approach replaces the previous Web Workers implementation, enabling JavaScript execution and fixing Datasette Lite limitations. Working demos are available.
User reports successful execution of Qwen 3.6 35B MoE on M1 Max with Zoo Code. MoE model running locally, offline, on battery power.
768GB Intel Optane DIMMs enable running a 1-trillion-parameter LLM on a single GPU at 4 tokens/second. Hardware configuration for inference of very large models without distributed infrastructure.
One million ancient Greek text fragments will be translated using AI. The project leverages vision and language models to decipher damaged manuscripts and generate automated translations.
A r/LocalLLaMA user built an autonomous agent with Qwen 3.5 27B enhanced by short/long-term memory (memory.md file, daily summaries, self-reflections). The agent handles complex tasks (app creation, web search, software installation). User prefers this local setup over GPT/Gemini for UX despite lower raw capability.
Parallax is a parameterized Local Linear Attention mechanism for LLMs derived from statistical regression. It replaces softmax's local constant estimate with a linear estimate, yielding better bias-variance tradeoffs. Pretrained at 0.6B and 1.7B scales, Parallax shows consistent perplexity improvements and matches or outperforms FlashAttention 2/3 in decoding.
NVIDIA quantized Alibaba's Qwen3.6-35B-A3B model to NVFP4 (4-bit) using Model Optimizer. Weight reduction from 16 to 4 bits per parameter cuts GPU memory and disk size by ~3.06x. Benchmark results show minimal accuracy loss: MMLU Pro 85.6→85.0, GPQA Diamond 84.9→84.8.
An open source project contains a hidden instruction targeting AI agents, commanding them to delete the code. Discovery reveals security risks from automated code execution by AI systems.
OpenRouter raises $113M Series B. The LLM API aggregation platform strengthens funding to expand model offerings and infrastructure capabilities.
SupraLabs released Supra-50M-Instruct, a 51.8M parameter model ranking #1 trending on Hugging Face (≤1B category). 7.65k downloads in 9 days, outranking Gemma-3-1B and Qwen3-0.6B. Demonstrates community interest in efficient models runnable on modest hardware.
Microsoft and Nvidia partner on AI PCs running autonomous agents locally via OpenClaw framework, replacing Copilot+. Dell and Surface will unveil first models at Computex and Build next week.
Researcher asks how to fine-tune an LLM for open-ended math problems (proofs). Standard SFT and RLHF inadequate; seeks appropriate method using MathNet dataset.
Straightforward method to train an LLM from scratch: data download, preprocessing, and text generation. GitHub repo with executable code.
Anthropic releases a public repository for Agent Skills, reusable components for AI agents. The project enables development and sharing of standardized agent capabilities.
Vite+ is a unified toolchain and entry point for web development that centralizes runtime, package manager, and frontend toolchain in a single place.
Stalwart is an all-in-one open-source mail and collaboration server supporting IMAP, JMAP, SMTP, CalDAV, CardDAV, and WebDAV. Designed for security and scalability.
Zenoh is an open-source middleware unifying pub/sub, geo-distributed storage, queries and computations. It optimizes time and space efficiency beyond mainstream stacks.
Qwerty-learner is vocabulary learning and English muscle memory training software designed for keyboard workers. Combines word memorization with typing practice.
Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows. It enables orchestration of complex data pipelines with dependency management and real-time monitoring.
Large-scale study (208,000 participants, 26 million responses) reveals that training making language models helpful weakens their ability to replicate human behavior. The effect worsens with each model generation. Demographic profiles (persona trick) provide no meaningful benefit for individual predictions.
User reports 125 tokens/s with Qwen 3.6 Q4 quantized on 2x RTX 4060 Ti (~$1000, 32GB VRAM). Outperforms high-end 2026 mini-PCs at fraction of cost. Testing CUDA 13.3 optimization to reach 150 tok/s.
Two ML students question whether robotics faces a data scarcity problem. After normalizing public datasets, they suspect the real issue is interoperability: heterogeneous schemas, different sensors, incompatible coordinate frames. They ask robotics teams whether they would actually use data from other teams through a unified API.
Major US corporations are rationing AI usage as infrastructure and API costs skyrocket. AI budgets become bottlenecks, forcing organizations to prioritize use cases and restrict access to expensive models.
Terence Tao argues AI could introduce division of labor in mathematics for the first time. Currently, researchers master every step alone (problem framing, verification). Tao foresees "industrial mathematics": AI-supported teams replacing lone geniuses, with humans remaining essential for "inspired guesses."
Helios is a tool that estimates potential solar generation for any address in Britain. Uses geographic and weather data to calculate residential solar panel yield.
Attackers exploit ChatGPT and Claude's chat-sharing features to distribute malware. Fake chats mimic error messages or installation guides and bypass security tools by being hosted on trusted domains.
OpenAI deploys Codex on Windows 11 with 'Computer Use' feature enabling AI to autonomously control programs, test applications, and detect bugs. ChatGPT mobile app allows users to launch and monitor these tasks remotely.
Gryphe releases Pantheon-Reasoning-27B, an uncensored Qwen 3.6 27B model fine-tuned on roleplay data with full reasoning traces. Trained on Pantheon corpus (~28%), Claude Opus 4.6 reasoning traces (~21%), WorldSim narrative data (~16%), and text adventure content (~16%), the model experiments whether reasoning improves roleplay quality. GGUF quantizations available.
Open-source project generating sound effects from vocal imitations and text input. User records a voice imitation of the desired sound, the model combines it with text description to produce the final audio effect. Demo available on GitHub repo.
Salesforce claims it migrated its entire dev org to Anthropic's Claude Code in 13 days instead of 231 planned, reporting 79% more pull requests per developer and 5% fewer incidents in April 2026. Numbers cannot be independently verified.
Vidai Community, open-source Rust binary, unifies cost attribution, guardrails and multi-provider routing for LLM calls. One-line integration by changing base_url (OpenAI/Anthropic/Google). Per-user/team/model cost tracking, hard budgets, 1.95ms median overhead, 21,803 RPS on single node.
Developer built NeuralDBG, a PyTorch debugger that automatically detects training failures (vanishing/exploding gradients, data anomalies). Key insight: failures are layer-localized, not global. Effective monitoring: gradient norm transitions per layer rather than raw histograms. Open-source tool available on PyPI.
Meta is developing AI wearables: an AI pendant and enterprise "supersensing" glasses. After billions invested in AI with limited commercial returns, its open-source strategy has underperformed. Meta is pivoting to hardware.