Page 61 of 147

AllHigh signalRecent
5859 articles
arXiv cs.AI·

Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning

VDSS, a multi-agent system for ventilator decision support, coordinates modular components through structured interfaces and contextual bandit preference learning from clinician feedback. Structured rejection triggers targeted replanning. Retrospective ICU validation shows higher recommendation acceptability and fewer interaction cycles.

Multi-agentReinforcement learningAI Agents
SIG
72
HYP
18
arXiv cs.CL·

ClimateChat-300K: A Multi-Modal Facebook Dataset for Understanding Diverse Perspectives in Climate Communication

ClimateChat-300K: dataset of 299,329 public Facebook posts on climate change (May 2020–May 2024), collected via CrowdTangle. 41 metadata features, 26,000+ global pages. Topic modeling and sentiment analysis identify 10 themes across 5 domains; emotionally charged and visually rich content drives highest engagement. Open resource for studying polarization and misinformation.

BenchmarksPapersOpen source
SIG
72
HYP
25
arXiv cs.CL·

When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

SCID-anchored benchmark of 555 semi-structured interviews evaluates 5 LLMs (GPT-4.1 Mini, GPT-5 Mini) on psychiatric screening (anxiety, depression, PTSD). Accuracy 0.49–0.86, MCC 0.16–0.38. False negatives reveal models downweight symptoms when functioning is preserved or social support present, requiring clinical validation before deployment.

BenchmarksGPTAI safety
SIG
72
HYP
25
arXiv cs.CL·

The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management

Unified framework for cost-performance optimization in LLM context management. Jointly evaluates task performance, token cost, and preprocessing reuse on 5,000 HotpotQA instances. Reduces effective token usage by 25% at comparable performance (F1≈0.78) and achieves 50% lower token cost with memory compression versus full-context prompting.

RAGBenchmarksInfrastructure
SIG
72
HYP
18
arXiv cs.CL·

What Training Data Teaches RL Memory Agents: An Empirical Study of Curriculum Effects in Memory-Augmented QA

Empirical study on curriculum effects for RL memory agents in multi-session dialogue with external memory banks. Three training conditions tested (LoCoMo only, LoCoMo + LongMemEval, LongMemEval only) show curriculum composition shapes specialized skills rather than uniform performance scaling. Mixed curriculum achieves strongest overall F1.

Reinforcement learningAI AgentsReasoning
SIG
72
HYP
15
Reddit r/LocalLLaMA·

I shipped a windows desktop app for running local LLMs with a button that turns your "no thats wrong" into actual LoRA training data

SEELS, a Windows desktop app for local LLMs, lets users correct model replies via a « Teach » button that accumulates corrections into a JSONL corpus, then triggers PEFT LoRA fine-tuning without terminal access. Includes local STT/TTS (Whisper/Piper), hardware dashboard, 0.6B model pre-trained on 110 examples. Free stable version; pro tier (image/video gen, MCP) and max tier (workflows, multi-GPU) in roadmap.

Fine-tuningOpen sourceTools
SIG
72
HYP
35
Reddit r/MachineLearning·

Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA [D]

Benchmark on 30 long PDFs (171 questions) comparing native vision-LLMs vs OCR pipelines for document QA. Claude Sonnet 4.5 used. LlamaCloud premium achieves 59.6% accuracy ($0.1885/query), native vision 52% ($0.2552/query, most expensive). Vision underperforms on charts/tables; premium OCR more robust. Vision-LLM has 7% intrinsic failure rate vs 0% for OCR after retries.

VisionBenchmarksRAG
SIG
72
HYP
25
Reddit r/MachineLearning·

Per-pixel bounding-box regression + DBSCAN for handwritten word detection - visual walkthrough of WordDetectorNet [P]

WordDetectorNet uses per-pixel bounding-box distance regression + DBSCAN for handwritten word detection. Each pixel classified as a word pixel regresses 4 scalar distances, generating thousands of candidates merged via DBSCAN with distance = 1 − IoU. Architecture: ResNet18 → FPN-style decoder → 6 output channels per pixel (2 segmentation logits + 4 distances). Trained on IAM, 448×448 → 224×224.

VisionCode generationOpen source
SIG
72
HYP
18
Reddit r/LocalLLaMA·

Did a 30 runs of llama-bench to find optimal settings for my use case (Frigate and HomeAssistant) on my MI60 32gb VRAM GPU - two models tested Gemma4 and Qwen3.6 - Figured I'd share in case it helps anyone else

User ran 30 llama.cpp benchmarks on MI60 32GB GPU to optimize Gemma 4 26B Q4_1 and Qwen3 35B Q4_0 for Frigate and HomeAssistant. Results: voice commands <1.2s, video summaries <18s. Systematic testing across KV cache depths (0, 1000, 6000 tokens) with 512-token prompt and 128-token generation.

LlamaBenchmarksCode generation
SIG
72
HYP
15
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> mukul975 /</span> Anthropic-Cybersecurity-Skills

Repository of 754 structured cybersecurity skills for AI agents, mapped to 5 frameworks (MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND, NIST AI RMF). Compatible with Claude Code, GitHub Copilot, Cursor, Gemini CLI and 20+ platforms. 26 security domains. Apache 2.0 license.

AI AgentsClaude CodeAI safety
SIG
72
HYP
25
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> OpenPipe /</span> ART

OpenPipe/ART: reinforcement learning framework for multi-step agents using GRPO. Enables on-the-job training across Qwen, GPT-OSS, Llama and other models.

AI AgentsReinforcement learningOpen source
SIG
72
HYP
35
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> mukul975 /</span> Anthropic-Cybersecurity-Skills

Repository of 754 structured cybersecurity skills for AI agents, mapped to 5 frameworks (MITRE ATT&CK, NIST CSF 2.0, MITRE ATLAS, D3FEND, NIST AI RMF). Compatible with Claude Code, GitHub Copilot, Cursor, Gemini CLI and 20+ platforms. 26 security domains. Apache 2.0 license.

AI AgentsClaude CodeAI safety
SIG
72
HYP
25