Page 94 of 150

Llama Fine-tuning Infrastructure

ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention

ThriftAttention introduces selective mixed precision for optimized FP4 attention on long contexts. The method reduces memory consumption and accelerates inference by applying varying precision levels to critical attention regions.

SIG

65

HYP

25

Hacker News (AI)·May 25

A successful Japanese trial of a ramjet engine designed for Mach‑5 aircraft

Japan successfully tested a ramjet engine designed for Mach-5 aircraft. The trial validates hypersonic propulsion technology, a key milestone toward next-generation supersonic aircraft.

Infrastructure

SIG

65

HYP

15

Llama Open source AI safety

The Financial Times has published an article about Heretic

Financial Times reports Heretic, a GitHub tool, removes guardrails from Llama 3.3 in under 10 minutes. Creator Philipp Emanuel Weidmann confirms 3,500 'decensored' models created and 13 million downloads since launch.

SIG

65

HYP

45

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> anthropics /</span> claude-cookbooks

Anthropic releases claude-cookbooks, a collection of notebooks and recipes demonstrating practical and creative ways to use Claude.

Claude Prompt engineering

SIG

65

HYP

25

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> AlexsJones /</span> llmfit

llmfit: CLI tool to test hundreds of LLM models and providers on your hardware. One command to identify what runs locally.

Tools Open source Infrastructure

SIG

65

HYP

25

GitHub Trending·May 25

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Zackriya-Solutions /</span> meetily

Meetily is an open-source, self-hosted meeting assistant built on Rust. 4x faster transcription than Whisper/Parakeet, speaker diarization, Ollama-based summarization. 100% local processing, no cloud required.

Open source Voice Tools

SIG

65

HYP

35

MCP AI Agents Open source

I made a local-first MCP tutorial repo with node-llama-cpp and a custom agent loop

A learning repo « MCP from Scratch » teaches Model Context Protocol in plain Node.js, from raw JSON-RPC to a working local agent loop (plan → act → observe) using node-llama-cpp and GGUF models. Designed to expose underlying mechanics without heavy abstractions.

SIG

65

HYP

25

arXiv cs.AI·May 25

Solving the Aircraft Disassembly Scheduling Problem

Paper on aircraft disassembly scheduling for end-of-life aircraft. Proposes Constraint Programming and MIP models to handle thousands of tasks with precedence constraints, technician certifications, aircraft balance, and space limitations. Tested on real instances up to 1450 tasks from industrial partner.

Benchmarks

SIG

65

HYP

15

arXiv cs.AI·May 25

CP or DP? Why Not Both: A Case Study in the Partial Shop Scheduling Problem

Academic paper combining Dynamic Programming (DP) and Constraint Programming (CP) to solve the Partial Shop Scheduling Problem. DP serves as primary search framework while CP leverages global constraint propagation. The approach integrates anytime strategies and Large Neighborhood Search schemes.

Benchmarks Reasoning

SIG

65

HYP

15

Llama Code generation Infrastructure

llama.cpp has a clever trick for speeding up KV cache decode

llama.cpp features a KV cache optimization that re-sends generated tokens to cache instead of waiting for next prompt, improving responsiveness. User reports latency reduction from 5-30s to near-instant on Qwen 3.6-35B with RX 7900 XTX (~100 tps).

SIG

65

HYP

25

OpenAI Blog·May 25

OpenAI, Grupo Folha and Grupo UOL announce strategic content partnership

OpenAI partners with Grupo Folha and Grupo UOL to integrate trusted Brazilian journalism into ChatGPT. Content will be attributed with transparency.

OpenAI Business

SIG

65

HYP

25

Qwen Code generation Open source

qwen3.6-35b-a3b-mtp running on GTX 1060 6GB

User runs Qwen 3.6-35B-A3B-MTP on GTX 1060 6GB via LMStudio. Setup: Q4_K_XL quantization, 131k context, 41 layers GPU-offloaded, prefill 130-150 tps, decode 16 tps. Usable for chat on legacy hardware.

SIG

65

HYP

15

Simon Willison·May 24

Quoting Armin Ronacher

Armin Ronacher (Pi creator) denounces LLM-generated bug reports poorly prompted against his open-source project. These reports contain inaccurate yet confident conclusions, fake minimal reproductions, and wrong root cause guesses. He requests contributors limit issues to observed facts: command run, expected outcome, actual outcome, exact logs.

Open source AI Agents Prompt engineering

SIG

65

HYP

45

Code generation Tools Open source

How I do use the recent llama.cpp native tools to do web rag a.k.a. web_fetch (or anything else for the matter) directly from inside the llama-server's webui

A llama.cpp user implements a secure web RAG workflow by enabling native server tools (exec_shell_command) with multi-sandboxing: firejail + dedicated Linux user + Alpine OCI container. Allows Qwen 3.6-35B model to execute wget commands directly from web UI to fetch and analyze content.

Llama RAG Tools

SIG

65

HYP

25

GitHub Trending·May 24

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> anthropics /</span> knowledge-work-plugins

Anthropic releases an open-source repository of plugins for Claude designed for knowledge workers. Plugins enable integration of Claude into professional workflows.

Claude Tools Open source

SIG

65

HYP

20

GitHub Trending·May 24

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Aider-AI /</span> aider

Aider is an AI pair programming tool running in the terminal. It enables developers to collaborate with AI directly in the command line for code writing and editing.

SIG

65

HYP

25

GitHub Trending·May 24

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> anthropics /</span> knowledge-work-plugins

Anthropic releases open-source repository of plugins for Claude designed for knowledge workers. Plugins enable Claude integration into productivity workflows.

Claude Tools Open source

SIG

65

HYP

20

The Decoder·May 24

Why you shouldn't leave model selection on default in Copilot, Gemini and other AI tools

Mathematician Adam Kucharski shows Microsoft Copilot invents country-based stereotypes when analyzing identical datasets with different country labels. Reasoning models catch the trick, but only if users explicitly select them instead of relying on default settings.

GPT Gemini Reasoning

SIG

65

HYP

35

The Decoder·May 24

Anthropic may keep supplying Claude to the NSA despite being flagged as a supply chain risk by the Pentagon

Anthropic likely continues supplying Claude to the NSA despite Pentagon flagging it as a supply chain risk. Intelligence agencies lack Nvidia's latest Grace Blackwell chips; Anthropic's "Mythos" model reportedly runs on older hardware. The controversial "any lawful use" clause is not part of the deal.

Claude Anthropic Regulation

SIG

65

HYP

35

I built a local GUI for the TradingAgents framework — works with Ollama

Developer builds web GUI for TradingAgents, a multi-agent LLM stock analysis framework. Replaces CLI with local interface supporting Ollama, OpenAI, Anthropic, Google, DeepSeek and others. Adds live pipeline visualization, report reader, token reduction (~50% concise mode), multi-session chat. Apache 2.0.

SIG

65

HYP

35

Benchmarks Tools Open source

TTS Benchmark Comparison (all known TTS up until May 2026)

TTS benchmark comparison covering all known models through May 2026. Windows and Mac results available, Linux testing underway. GitHub repo with HTML results page.

SIG

65

HYP

25

Embeddings for NVIDIA's Nemotron Personas

User computed embeddings for NVIDIA's Nemotron-Personas dataset (millions of synthetic personas) using Qwen 0.6B. Precomputed vectors enable semantic search and persona clustering. Precomputed embeddings and web demo available on Hugging Face.

Embeddings Qwen RAG

SIG

65

HYP

25

Open source Infrastructure Code generation

NVFP4 + MTP - voilà on llama.cpp

NVFP4 and MTP are now available together in llama.cpp (release b9297). This combination of quantization and optimization enables improved performance on NVIDIA GPUs.

SIG

65

HYP

15

Run Chrome’s tiny Gemma4 (aka Gemini Nano) directly on PC without GPU

Chrome extension to run Gemini Nano (Gemma) locally on PC without GPU. Requires 16 GB RAM, ~20 tokens/s on laptop, 9216 tokens per session. One-click extension available on Chrome Web Store or GitHub repo.

Gemini Tools Open source

SIG

65

HYP

25

Llama Code generation Open source

Made a package to install llama.cpp server binaries

Python package to install prebuilt llama.cpp server binaries. Solves portability: deploy llama.cpp as local subprocess without documenting build steps. Available on PyPI and GitHub with support for standard llama.cpp flags and custom builds.

SIG

65

HYP

15

Llama Open source Benchmarks

Llama.cpp VS LiteRT on a custom Xiaomi 12 Pro 24/7 Server (V2 Redesign)

Benchmark llama.cpp vs LiteRT (Google) on custom 24/7 server using Xiaomi 12 Pro (Snapdragon 8 Gen 1). Llama.cpp: 30.6 t/s prompt, 5.7 t/s generation, moderate CPU load. LiteRT: slightly faster generation but maxes CPU and higher power draw. Setup features copper/aluminum cooling, custom safe PSU, 3D-printed case.

SIG

65

HYP

25

Reddit r/MachineLearning·May 23

Open-source devtool for AI agent projects [P]

AgentLantern is an open-source devtool for AI agent projects. It provides three features: documentation generation from source code, static linting to detect configuration issues, and a pixel-art runtime viewer. Initial CrewAI support with plans to extend to other frameworks.

AI Agents Code generation Open source

SIG

65

HYP

25

GitHub Trending·May 23

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> databricks-solutions /</span> ai-dev-kit

Databricks releases ai-dev-kit, a toolkit for building coding agents. Maintained by Field Engineering, the project provides components and patterns to construct AI agents capable of generating and manipulating code.

SIG

65

HYP

25

GitHub Trending·May 23

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> pydantic /</span> pydantic-ai

Pydantic-AI is an open-source framework for building AI agents following Pydantic principles. It provides a structured approach to developing multi-agent systems with built-in data validation.

SIG

65

HYP

35

GitHub Trending·May 23

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> crewAIInc /</span> crewAI

CrewAI is an open-source framework for orchestrating autonomous AI agents in collaborative roles. It enables agents to work together seamlessly on complex tasks through collective intelligence.

SIG

65

HYP

35

The Decoder·May 23

One of the world's top law schools draws a hard line against AI in legal education

UC Berkeley Law bans AI from nearly all graded work starting summer 2026, including outlining, drafting, and proofreading. Only research use permitted. Rationale: future lawyers must learn independent thinking before meaningfully using AI.

Regulation Business

SIG

65

HYP

25

The Decoder·May 23

Google CEO Pichai now calls links a "part" of search, redefining the web's role in its own product

Google CEO Sundar Pichai reframes links as a "part" of search rather than its foundation. Google is pivoting from traffic distributor to AI publisher, keeping users within its ecosystem and exercising editorial power over source selection.

DeepMind Business

SIG

65

HYP

45