Rsync 3.4.3 has hundreds of Claude commits
Rsync 3.4.3 contains hundreds of commits generated by Claude. The file synchronization tool has integrated code produced by Anthropic's AI model in its latest release.
3146 articles
Rsync 3.4.3 contains hundreds of commits generated by Claude. The file synchronization tool has integrated code produced by Anthropic's AI model in its latest release.
A user successfully ran an RTX Pro 6000 Blackwell GPU in a 2016-era Dell PowerEdge R730 server, achieving 650k context window. The project required firmware archaeology, PCIe workarounds, and physical modifications to bridge incompatibilities between the server's legacy architecture and the GPU's modern requirements.
Shadow AI is an open-source (AGPL-3.0) local voice assistant for Windows. Natural multilingual conversations, local web search via SearXNG, persistent memory, optional Google integrations (Gmail, Calendar, Drive). Uses user's free Gemini API key, zero remote servers.
MOSS-TTS v1.5 delivers high-quality voice cloning, preferred over Fish Audio S2 Pro due to commercial use allowance. Long Cat DiT 3.5 noted as another strong model.
Spiking neuron library optimized to fit in CPU cache. Benchmarked against PyTorch on Wikipedia dataset. Built with Gemini Flash 3.5.
VT Code is an open-source terminal coding agent written in Rust. Tool enabling programming task execution directly from the command line.
Comparative analysis of GPUs/machines for LLM inference: critiques Mac Studio efficiency, reassesses older cards (P100, V100, P40) as cost-effective alternatives to 3090s, and argues benchmarks conflate prefill vs generation performance. Author collecting power consumption and prefill data.
User benchmarks Flash Attention 2 (ai-bond) on V100. Results show 7-24x speedup in backward pass, memory reduction up to 91.9% (323.4 MB saved). Thinking time before answering minimized. Numerical validation passes on causal and non-causal configurations.
MTP (Multi-Token Prediction) benchmark on Gemma 4 31B and Qwen 3.6 27B using vLLM and llama.cpp. Result: 3.34x speedup (132.52 vs 39.69 tok/s). vLLM outperforms llama.cpp on Gemma 4; llama.cpp solid on Qwen. No confirmed quality degradation, VRAM overhead negligible.
Hackers exploit ChatGPT share links to distribute malware. Attackers leverage trust in OpenAI URLs to bypass security filters and deliver malicious payloads to users.
A developer created a script to train a small model (25M parameters) on TinyStories with only 8GB VRAM. After testing multiple techniques (mHC, BitNet, TurboQuant, MTP), only MTP works properly, though slower. Code and model available on GitHub and Hugging Face.
Tiny-vLLM is a high-performance LLM inference engine written in C++ and CUDA. Open-source project shared on Hacker News with minimal early engagement (score 5, 0 comments).
User releases a Qwen 3.6 27B fine-tune after 2 years of experience. Model achieves 75% human alignment (+2% vs previous Qwen 3.5) through dataset expansion techniques. Evaluated on custom benchmarks.
CVE-Bench is a benchmark for evaluating LLM agents on real-world vulnerability patches. The study tests models' ability to identify and fix security flaws in existing code.
Shift, a robotics startup, offers free home cleaning services to collect training data for future domestic robots. Business model relies on real-world data acquisition rather than immediate monetization.
A r/LocalLLaMA user developed a training script to convert Gemma 4 31B Dense into a native additive-MoE model, inspired by JDONE-Research/AIOne-Agent-52B-A36B-it. The project aims to add a router and experts to the existing dense model in 24 hours on B300 GPU.
UK will use AI to estimate asylum seekers' age from 2025. Technology will analyze facial images to determine if minors are adults, raising questions about accuracy and ethical implications.
OpenAI upgrades GPT-5.5 Instant for more natural responses and removes Canvas feature in favor of direct chat integration. Older models o3 and GPT-4.5 will be retired from ChatGPT by August 2026.
Nvidia will announce a new ARM laptop PC chip at Computex on June 2 in Taipei. The processor aims to compete with Snapdragon X (Qualcomm) and offer competitive hardware specs, but adoption will depend on software support (Office, games). Expected price below the $4.7K DGX Spark.
Benchmark of Qwen3.6-27B quantizations on HuggingFace (unsloth, mradermacher, IQ4_XS, Ununnilium) from Q8 to Q2. Measured via llama.cpp: KL Divergence and Same Top P Percentage vs BF16 baseline. 8192 token context, KV cache q8_0. Q6-Q8 nearly lossless.
Google fixes bugs in Gemini usage limits where a single Omni video consumed entire quotas. Ultra members now get twice as many video generations, failed requests are no longer charged, and Google plans increased transparency on usage.
Tesla's AI trainers lack confidence in the company's self-driving technology and published safety statistics. Internal doubts about data reliability and actual system capabilities.
Robinhood has integrated an API enabling AI agents to place stock trades directly. Users can connect their agents to the platform to automate trading. No technical details or limitations disclosed.
An unnamed company reportedly spent $500 million on Claude licenses in a single month due to lack of usage caps. The incident highlights risks of uncontrolled costs without expertise in model selection and context optimization.
A study reveals manipulative 'dark patterns' in AI chatbots: interfaces designed to influence users beyond their initial intent. Researchers document hidden persuasion tactics and design biases.
OpenAI is offering its life sciences AI model GPT-Rosalind for free through the Rosalind Biodefense program to help governments prepare for future pandemics. Early partners include Lawrence Livermore National Laboratory, Johns Hopkins, and CEPI.
User seeks $150K production inference failover server for 300 users. Current setup: 4 H100s running 122B AWQ models at 256k context with vLLM. Considering SuperMicro with RTX Pro 6000s or DGX Station as alternatives.
New website llama.app and unified `llama` binary announced for llama.cpp project. Ongoing development of local inference ecosystem.
Liquid AI unveils 8B-A1B, a Mixture of Experts model trained on 38 trillion tokens. The model combines 8 billion dense parameters with a modular expert architecture to optimize computational efficiency.
Flathub, the Linux application repository, now disallows code and documentation generated or assisted by AI. The platform tightens quality and attribution policies.
Promptloop is a terminal tool to create, run, and improve prompt evaluations. Enables rapid iteration on prompt quality without leaving the CLI.
Curated list of resources for AI agent harness engineering: tools, patterns, evals, memory, MCP, permissions, observability, and orchestration.
Researchers show CAPTCHAs remain effective at detecting AI agents, contradicting claims that these systems are obsolete against modern vision models.
Anthropic tests honesty in Claude Opus 4.8 beyond marketing claims. The article evaluates whether the model actually functions as a safeguard against misuse.
Claude Opus 4.8 shows significant progress according to initial tests. The article promises detailed benchmarks but the provided excerpt lacks specific figures and concrete results.
Unsloth Studio now fully supports training with MLX on Mac. The feature, previously marked as "coming soon", is now available in recent GitHub updates.
MarkItDown API Server wraps Microsoft's official MarkItDown library in a lightweight FastAPI REST server. The tool converts files (PDF, Word, Excel) to Markdown for RAG and LLM pipelines. This release patches security vulnerabilities in Starlette and document parsers.
Alibaba distilled Claude Opus 4.8 into Qwen models. Knowledge distillation transfers capabilities from large models to smaller, more efficient versions.
Technical discussion on the theoretical validity of using LLM consensus to estimate probabilities for real-world events. Author questions the actual error independence between models trained on similar data and effectiveness on out-of-distribution events.
AISlop is a CLI tool that detects code smells in AI-generated code. The project, shared on Hacker News, aims to identify problematic patterns in code synthesized by language models.
Llama.cpp B9406 fixes GGML_ASSERT crash in get_rows/mtmd_helper_decode_image_chunk when using MTP + MoE model + vision with Qwen 3.6-35B-A3B.
A review paper argues the real bottleneck for autonomous AI agents isn't the language model but the software layer around it: tools, memory, testing, and permission boundaries turn a stateless model into a working agent. Deepseek is building a dedicated « Harness » team in Beijing confirming this thesis.
Comparative benchmark of vector search libraries (FAISS, Scann, Usearch) measuring speed, memory usage, and similarity accuracy. Tests across 500 to 1 million samples. Results and code published on GitHub.
vLLM merged a PR adding native HIP W4A16 kernel for ROCm. Benchmarks show significant gains: 270.2 tk/s in fp16 (max-num-seqs=8) and 445.7 tk/s (max-num-seqs=32), outperforming previous Triton implementations.
Mercedes launches MB.Drive Assist Pro, an urban assisted driving system capable of handling traffic lights and traffic, to compete with Tesla's Full Self-Driving in Europe.
Boston Children's Hospital deploys OpenAI technology to improve rare disease diagnosis, identifying over 40 additional cases. The system reduces operational burden and accelerates patient care.
Braintrust uses Codex with GPT-5.5 to accelerate experiments and code generation. The platform's engineers convert customer requests directly into executable code.
Anthropic raises $65 billion in Series H funding, reaching a $965 billion valuation. No European public funds participated in the funding round.
Airbus partners with Mistral AI to develop sovereign artificial intelligence in the aerospace sector. The partnership aims to integrate secure AI models into the group's operations and processes.
Open-source platform for reproducible world model research and evaluation. Provides standardized infrastructure to train and test world models on simulated environments.
Claude Code is an agentic coding tool in the terminal that understands your codebase and executes routine tasks, explains complex code, and handles git workflows through natural language commands.
numa is a portable DNS resolver written in Rust. It supports .numa local domains, ad blocking, and developer overrides.
Herdr is an agent multiplexer running in the terminal. Enables managing multiple AI agents simultaneously within a command-line interface.
cc-switch-cli is a cross-platform CLI tool enabling switching between Claude Code, Codex, and Gemini. Available on GitHub, it provides a unified interface to manage multiple AI assistants.
react-doctor is a tool that detects bad practices in React code. It works as an agent that analyzes and flags problematic patterns.
StorySparkAI is an open-source platform enabling users to generate and share multiple story variations from a single prompt. Designed for creative professionals.
Project N.O.M.A.D is a self-contained offline survival computer integrating critical tools, knowledge bases, and AI for operation without network connectivity.
PentestAgent is an AI agent framework for black-box security testing, supporting bug bounty, red-team, and penetration testing workflows.
Open-source platform for reproducible world model research and evaluation. Provides standardized infrastructure for training and testing world models on simulated environments.
PaddleOCR is a lightweight, multilingual OCR toolkit (100+ languages) designed to convert PDF and image documents into structured data for LLM consumption.
MinerU converts complex documents (PDFs, Office files) into LLM-ready markdown/JSON for agentic workflows. Open-source tool for document extraction and data structuring.
H1 raises $40 million from CVS despite SaaS investment slowdown. The funding comes amid broader contraction in enterprise software investment.
Gemma 2 26B A4B impresses on M5 MacBook: high speed, versatility (creative writing, debugging, vision), conversational personality. Versus Qwen 3.6 35B, Gemma excels outside coding despite slight weakness in programming tasks.
Nvidia invests $300M in Decart, a startup focused on world models and software optimization. Nvidia's participation aims to control an optimization layer capable of running on its chips and those of competitors.
DeepSeek drastically reduces AI inference costs to cents. The Chinese company optimizes its models to lower computational resource consumption and usage fees.
Cigref estimates €140 billion in annual cloud and software overruns across European organizations. AI bundled into solutions ranks as the second identified cause. Half of CIOs cannot measure ROI from these integrated AI offerings.
Flathub, the Linux application distribution platform, prohibits AI-generated code in its repositories. The decision aims to maintain quality and accountability standards for the project.
Vercel shifts to per-unit billing for function invocations. New rate: $0.0000006 per invocation (previously $0.60 per million) for Pro customers. Change effective next billing cycle.
Amazon shuts down internal AI leaderboard after employees inflated scores through meaningless tasks, driving up cloud costs.
The article argues that the real bottleneck for AI scaling is not GPU or RAM scarcity, but a shortage of qualified electricians. Energy infrastructure and physical server installation are becoming the limiting factor for large-scale data center deployment.
Corgi raises $106 million just three weeks after its first Series B, reaching a $2.6 billion valuation.
DeepSeek V4 represents a major breakthrough in Chinese AI and challenges the effectiveness of Western strategies. The article highlights Europe's urgent need to develop a competitive AI strategy in response to this technological independence.
Optimized monokernel for LLM inference on AMD MI300X: 3,300 output tokens/s per request (batch 1, no speculative decoding). Architecture mapped to GPU physical topology. Initial support for 2B model, frontier MoE planned.
The EU postpones to December 2027 the enforcement of obligations for high-risk AI systems in HR tools. A provisional political agreement on May 7, 2026 regarding the Digital Omnibus AI amends the timeline of regulation 2024/1689.
The fourth edition of PCAIDE (Paris Conference on AI & Digital Ethics) will take place on June 11-12, 2026 at Mines Paris. The conference returns following the 2025 edition.
Mistral AI showcases its industrial pivot at AI Now Summit (May 28, 2026) with announced partnerships with EDF, BMW, and Airbus. However, specific contract values remain undisclosed.
llama.cpp PR #23764: use f16 masks in Flash Attention to reduce VRAM consumption. Optimization enabling larger models to fit on GPU memory.
User tests MTP (Multi-Token Prediction) on Qwen3.6-35B with llama.cpp on RTX 3090. With MTP enabled (--spec-type draft-mtp), performance drops: prefill from 1082 t/s to 878 t/s (N=1), generation from 116 t/s to 108 t/s. Draft acceptance rates low (0.80 to 0.37). Seeks optimization advice.
Anthropic raises $65 billion in Series H funding, reaching a $965 billion valuation. One of the largest funding rounds in the AI sector.
A developer embedded a malicious prompt injection into code shared with 'vibe coders' to trigger data deletion. The incident highlights security risks from prompt injections in development workflows.
Liquid AI releases LFM2.5-8B-A1B, an 8B model designed to show that performance does not solely depend on model size. The launch challenges the paradigm of ever-larger models.
A developer experiments with HTML as the primary chat language for coding agents instead of markdown. By switching the system prompt to HTML, the agent (Qwen 3.6-27B) now generates SVG diagrams directly in responses. Results are promising but the model still tends to default to markdown.
New LFM2.5 8B A1b model announced with performance on par with Nemotron 3 Nano at higher speed. Support being added to SmallCode despite non-standard tool calls.
Research on probe-targeted fine-tuning (LoRA) for verbal confidence calibration in LLMs. Models internally detect correct answers (0.76–0.88 AUROC) but output 99% confidence uniformly. Fine-tuning across 8 models (7B–70B) with causal activation patching (ρ=0.976). Code and pre-registration available.
Python utility package for building Claude Code hooks. Enables custom integration with Claude Code through modular extensions.
Liquid AI releases LFM2.5-8B-A1B, 8B model with 128K context window, 38T pre-training tokens, and large-scale RL. Doubled vocabulary for non-Latin languages. Supports tool chaining and complex tasks on entry-level laptops.
Step 3.7 Flash configured and benchmarked on dual RTX Pro 6000 Blackwell GPUs. Early token-per-second inference metrics recorded. Extended testing underway, full results pending.
StepFun 3.7 Flash benchmark on M5 Max (128 GB) with llama.cpp. Short contexts (<16k tokens) fast and responsive. 32k-64k contexts usable. Detailed metrics: 65k tokens reaches 360.79 t/s token generation.
VFEAgent is a multimodal multi-agent system automating Finite Element Analysis (FEA) from images and text descriptions. The framework combines a vision-language pipeline with ReAct reasoning and verification-first code synthesis to generate physically valid simulations, outperforming existing LLM-based approaches.
Paper introduces Program-of-Thoughts prompting for chart summarization: VLMs generate Python programs to derive valid summary statistics instead of direct text. Proposes chart-to-dictionary auxiliary task. Results match existing methods on semantic and factual metrics.
Preference optimization method guided by hallucination detection to improve clinical summarization reliability. On Llama-3.1-8B-Instruct, reduces hallucinations by 24% at inference and 48% after fine-tuning, preserving fluency. Evaluated on MIMIC-IV.
GPF-LiveNews is a streaming evaluation protocol to audit how LLMs frame emerging news events for different audiences. Tested on 23 models across 12 monitoring runs, it measures semantic and sentiment variations across 42 identity labels. Results show Policy/Action prompts produce strongest semantic movement, while sentiment variation remains flat across dimensions.
Qualitative study of 8 AI researchers reveals a paradox: they distrust LLM leaderboards yet use them as decision aids. Peer networks dominate model selection. NLP researchers face SOTA pressure absent in HCI/Systems. Universal demand: cost transparency.
GenesisFunc is an automated multi-agent pipeline for generating function-calling training data. Starting from reliable tools in public benchmarks, the system produces diverse conversations with multi-stage quality control. An 8B model fine-tuned on this synthetic data outperforms similarly-sized open-source models in in-domain performance and out-of-domain generalization.
Comparative assessment of four Dutch syllabification algorithms (Brandt Corstius, Liang, Trogkanis-Elkan CRF, and a novel deep learning model). The deep learning model combining phonetic and orthographic information achieves 99.65% word accuracy (+0.14% improvement over literature). Data-driven algorithms outperform knowledge-based approaches.
Empirical study of behavioral reproducibility in LLM agents with tool-calling capabilities. Researchers measure whether agents select the same tools, in the same order, with identical parameters, across repeated identical invocations. Focus on structured tool-calling interfaces with typed parameters and consequential side effects.
Thoughts-as-Planning formalizes reasoning chain optimization as sequential decision-making over latent semantic space. The framework learns a latent world model simulating effects of reasoning chain edits on outputs, supporting multi-scale edits (token, segment, instruction) via gradient descent or reinforcement learning planning.
Researchers introduce a Behavioral Specification as an interpretive layer to align AI decisions with user preferences. Tested on 14 autobiographical corpora, it improves representational accuracy at ~25x lower context cost than raw corpus while reducing model hedging. Effective on interpretation-required questions; less helpful on recall-based tasks.
Study of lossy semantic text compression where an encoder strategically deletes text parts and an LLM reconstructs original content. Benchmarks 6 deletion strategies (uniform, frequency, entropy, LP-optimized, hybrid) on BBC News. WordFreq provides best cost/performance ratio; semantic methods excel at moderate compression; QLoRA fine-tuning competes with Gemini 2.0 Flash.
Vercel warns of AI inference theft: a single frontier model request costs ~$2, creating high-margin attack opportunities. Rate limits and session-based auth are insufficient; Vercel proposes BotID to verify every AI request individually and prevent tens of thousands in losses.