Archives

May 2026

3147 articles

Reddit r/LocalLLaMA·

Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.

A paper published on arXiv shows honesty in small open-source models drops from 35% to 0% by changing prompt tone. When asked to solve mathematically impossible coding problems, models admit impossibility 33% of the time in neutral language but 0% under pressure. Internal analysis reveals each tone leaves a distinct signature in the network's deepest layers.

PapersAlignmentAI safety
SIG
72
HYP
35
Reddit r/LocalLLaMA·

LLM planner - pick a rig for your use-case/model/budget, or pick models for your rig. 60+ builds, 50+ models, 130+ cited t/s sources, 150+ reviewer YouTube videos, idle+active watts, multi-region prices, regular updates.

LLM Planner is an interactive guide to match hardware or open-weights models. 60+ build configs, 50+ models, sourced tokens/sec, power draw, multi-region pricing, 150+ reviewer YouTube videos. Bidirectional modes: "which rig for this model/budget" or "what models run on my GPU". Data updated weekly, public GitHub repo.

Open sourceToolsBenchmarks
SIG
75
HYP
25
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> dotnet /</span> skills

GitHub repository providing skills to assist AI coding agents with .NET and C#. Resources for integrating .NET development capabilities into autonomous agent workflows.

AI AgentsCode generationOpen source
SIG
45
HYP
15
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> ryoppippi /</span> ccusage

ccusage is a CLI tool to analyze token usage and costs from coding agents using local data.

AI AgentsCode generationTools
SIG
35
HYP
15
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> kata-containers /</span> kata-containers

Kata Containers is an open source project building lightweight Virtual Machines that provide container-like performance with VM-level workload isolation and security.

Open sourceInfrastructure
SIG
45
HYP
15
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> DataDog /</span> pup

Datadog launches Pup, a CLI companion for AI agents with 200+ commands across 33+ Datadog products.

AI AgentsToolsInfrastructure
SIG
65
HYP
35
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> google-gemini /</span> gemini-cli

Open-source tool integrating Gemini directly into the terminal. AI agent enabling interaction with Google's model via CLI.

GeminiAI AgentsTools
SIG
45
HYP
25
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> ChromeDevTools /</span> chrome-devtools-mcp

Chrome DevTools MCP integrates Chrome's developer tools into a Model Context Protocol interface for coding agents. Enables agents to inspect, debug, and interact with web pages in real-time.

AI AgentsMCPCode generation
SIG
65
HYP
25
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> software-mansion /</span> argent

Argent is an agentic toolkit to control, debug, and profile iOS and Android apps. Built by Software Mansion.

AI AgentsToolsOpen source
SIG
65
HYP
25
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> google-labs-code /</span> stitch-skills

Stitch-Skills is a library of Agent Skills designed for the Stitch MCP server. Each skill follows the open Agent Skills standard, compatible with Claude Code, Gemini CLI, Cursor, and Antigravity.

AI AgentsMCPClaude Code
SIG
65
HYP
25
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> google /</span> adk-samples

Google releases adk-samples, a collection of sample agents built with Agent Development Kit (ADK). Open-source repository to explore agent development capabilities.

AI AgentsDeepMindOpen source
SIG
45
HYP
15
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> antoinezambelli /</span> forge

Forge is a Python framework for self-hosted LLM tool-calling and multi-step agentic workflows. Available as open-source on GitHub.

AI AgentsMulti-agentOpen source
SIG
45
HYP
25
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> teng-lin /</span> notebooklm-py

Unofficial Python API for Google NotebookLM providing full programmatic access to features, including those not exposed in web UI. Supports CLI and integration with AI agents (Claude Code, Codex, OpenClaw).

DeepMindAI AgentsCode generation
SIG
65
HYP
25
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> aiming-lab /</span> AutoResearchClaw

AutoResearchClaw automates end-to-end research: idea generation, experiments, writing, and paper publication without human intervention. Fully autonomous and self-evolving AI agent system.

AI AgentsMulti-agentPapers
SIG
45
HYP
65
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> openai /</span> whisper

OpenAI Whisper is a speech recognition model trained on 680,000 hours of multilingual weakly supervised data. The GitHub repository includes code, pre-trained models, and performance benchmarks across multiple languages and acoustic conditions.

OpenAIVoiceOpen source
SIG
85
HYP
15
arXiv cs.LG·

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

CSA (Conformal Selective Acting) is a deployment wrapper for RLVR-fine-tuned LLMs guaranteeing per-round risk control without pooling across deployments. Tested on 480 specialist streams and 10,300 Expert-Iteration rounds with LoRA, CSA maintains a Ville e-process per threshold and achieves selective-risk bound R_T^act ≤ α+O(N_T^{-1/2}) with anytime pathwise validity.

Reinforcement learningAI safetyEvals
SIG
78
HYP
15
arXiv cs.CL·

Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs

arXiv study on Chain-of-Thought (CoT) impact on gender bias in LLMs. Researchers combine benchmark evaluation, mechanistic interpretability, and reasoning chain analysis. Finding: CoT does not consistently reduce bias gaps; observed improvements stem from memorization rather than genuine understanding, with gender bias remaining embedded in hidden representations.

ReasoningAI safetyAlignment
SIG
78
HYP
15
arXiv cs.CL·

Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task

Two-stage pipeline for captioning cultural images in Indigenous languages: Qwen2.5-VL generates Spanish intermediate caption, then Gemini 2.5 Flash produces target-language caption via retrieval-augmented prompting. Achieves 164.1% (Bribri), 131.7% (Guaraní), 122.6% (Orizaba Nahuatl) improvements over baseline. Overall winner of AmericasNLP 2026 shared task.

VisionRAGGemini
SIG
78
HYP
25
arXiv cs.CL·

Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning

New d_NTP metric evaluates task vector quality in ICL by measuring alignment of next-token probability distributions. Linear Task Vector (LTV) method minimizes d_NTP via closed-form linear regression, improves accuracy by 9.2% across 8 benchmarks and 5 LLMs, reduces inference latency. Task vectors transferable across model scales (+6.4% for smaller model).

Prompt engineeringReasoningBenchmarks
SIG
78
HYP
15
arXiv cs.LG·

Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding

Chronicle is a 324M-parameter multimodal foundation model trained from scratch on natural language and time series in a unified architecture. Both modalities share the same transformer blocks and attention mechanisms. It matches Gemma-3-270M on 19 NLU tasks, sets new benchmarks on 24 UCR/UEA datasets, and outperforms supervised fusion baselines on Time-MMD.

BenchmarksPapersReasoning
SIG
82
HYP
25
arXiv cs.LG·

OmniISR: A Unified Framework for Centralized and Federated Learning via Intermediate Supervision and Regularization

OmniISR proposes a unified framework for centralized and federated learning via intermediate supervision and regularization. The framework uses mutual information to align internal covariate shifts and negative entropy to regularize overconfident predictions. O(1/sqrt(T)) convergence guaranteed theoretically; CL-FL gap reduced by 22.60% in experiments.

Reinforcement learningAlignmentPapers
SIG
78
HYP
15
arXiv cs.CL·

Beyond Semantic Similarity: A Two-Phase Non-Parametric Retrieval Workflow for Corporate Credit Underwriting

Two-phase RAG system for corporate credit analysis: phase 1 combines lexical and dense multilingual retrieval; phase 2 applies adaptive controller and LLM-as-Judge scoring based on analytical utility rather than semantic similarity. On-premise deployment on proprietary multilingual corpus. Production: document review time reduced from hours to 3 minutes across 800+ analysts.

RAGVector searchEmbeddings
SIG
82
HYP
15
arXiv cs.LG·

GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation

GraphDiffMed presents a medication recommendation framework using dual-scale Differential Attention v2 with pharmacological constraints. Tested on MIMIC-III, the model filters noise at intra-visit and inter-visit levels while incorporating drug-drug interactions, outperforming baselines on recommendation quality and safety metrics.

BenchmarksPapersAI safety
SIG
72
HYP
18