Topic

#RAG

RAG (Retrieval-Augmented Generation) is a technique that connects an LLM to an external document base to generate answers grounded in real sources. For example, LlamaIndex lets developers build RAG pipelines by indexing their own data and querying it through a language model.

40Articles

6Sources

71Avg. signal

arXiv cs.CL·Jun 18

Redact or Keep? A Fully Local AI Cascade for Educational Dialogue De-Identification

Local de-identification framework for educational dialogues. Two-stage cascade: union proposer (lightweight encoders + deterministic rules) generates PII candidates, then binary Redact/Keep reviewer uses dialogue context and speaker role. Achieves 0.958 macro F1 on math tutoring transcripts, outperforms commercial API (0.706) and local LLM baseline (0.767), runs on single laptop.

RAG AI safety Papers

SIG

HYP

arXiv cs.CL·Jun 18

MCompassRAG: Topic Metadata as a Semantic Compass for Paragraph-Level Retrieval

MCompassRAG improves RAG systems by using topic-level metadata as a semantic compass for paragraph-level retrieval. The method enriches chunk representations with topic signals in the same embedding space and trains a lightweight retriever via LLM-teacher distillation. Across six benchmarks, it gains 8.24% in information efficiency with 5× lower latency than efficient RAG baselines.

RAG Embeddings Benchmarks

SIG

HYP

arXiv cs.CL·Jun 18

PEC-Home: Interpretation of Progressively Elliptical Commands in Smart Homes

PEC-Home is a simulated home dataset for interpreting progressively elliptical commands in smart homes. Current assistants (including GPT-4o) fail to execute these abbreviated commands accurately due to accumulated shared context, even when equipped with dialogue history retrieval.

AI Agents Benchmarks RAG

SIG

HYP

arXiv cs.CL·Jun 18

ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement

ScholarSum introduces a hierarchical knowledge graph framework for abstractive scientific summarization. The system organizes documents into semantically coherent units, generates an initial draft, then refines it through iterative verification and rewriting to ensure logical coherence and factual faithfulness.

Papers RAG Reasoning

SIG

HYP

arXiv cs.CL·Jun 18

Improving Medical Communication using Rubric-Guided Counterfactual Recommendations

LM-guided counterfactual recommendation pipeline to improve medical communication in text-based telemedicine. System identifies interpretable features (tone, personalization, clarity, completeness) and recommends minimal communication changes predicted to increase positive feedback (+6.41% mean gain). Modifications preserve medical content and physician control.

Reasoning Evals RAG

SIG

HYP

arXiv cs.AI·Jun 18

ProfiLLM: Utility-Aligned Agentic User Profiling for Industrial Ride-Hailing Dispatch

ProfiLLM is an agentic LLM pipeline deployed at DiDi to extract semantic user profiles from massive behavioral logs. The system uses 27 analytical tools to mine platform-scale data and generates utility-aligned profiles, achieving +6.14% AUC improvement and +0.47% GMV gain in A/B testing.

AI Agents Llama RAG

SIG

HYP

arXiv cs.CL·Jun 18

CoreMem: Riemannian Retrieval and Fisher-Guided Distillation for Long-Term Memory in Dialogue Agents

CoreMem introduces a memory architecture for personalized dialogue agents on edge devices (8 GB VRAM). Replaces cosine similarity with Fisher-Rao metric for retrieval and uses Fisher-guided token distillation for compression. Achieves +4.51 pp gains in open-domain reasoning and +4.17 pp in temporal reasoning on LOCOMO and LongMemEval-S benchmarks.

AI Agents RAG Embeddings

SIG

HYP

arXiv cs.CL·Jun 18

Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation

DICE improves long-document retrieval by splitting documents into chunks, encoding each independently, then aggregating vectors into a single representation. On LongEmbed, gains reach 90.0 for Dream Passkey >4k (vs 30.0) and 74.0 for Needle >4k (vs 23.3). The approach reduces Evidence Dilution Index (EDI) in 92.8% of cases.

RAG Embeddings Vector search

SIG

HYP

arXiv cs.CL·Jun 18

Efficient Financial Language Understanding via Distillation with Synthetic Data

Distillation framework with synthetic data for financial sentiment analysis. Knowledge transfer from large instruction-tuned teacher to compact student models. Clustering-based seed selection generates synthetic examples via few-shot prompting. Compact model outperforms teacher on complex/noisy text with minimal supervision.

Fine-tuning RAG Prompt engineering

SIG

HYP

arXiv cs.AI·Jun 18

CaVe-VLM-CoT: An Interpretable Vision-Language Model Framework

CaVe-VLM-CoT is a modular agentic-RAG framework reducing VLM hallucinations through a five-stage closed-loop pipeline (Extractor, Retriever, Solver, Citation Injector, Verifier). Ungrounded claims trigger targeted re-retrieval. 23 component-wise metrics and CaVeScore measure citation faithfulness and cross-modal grounding. Results: 87.1% accuracy on ScienceQA, 55.2% on MMMU.

Vision RAG AI Agents

SIG

HYP

arXiv cs.AI·Jun 18

Decoupling Search from Reasoning: A Vendor-Agnostic Grounding Architecture for LLM Agents

Decoupled Search Grounding (DSG) decouples search from reasoning via an MCP-compatible gateway. On SimpleQA, FreshQA, and HotpotQA, DSG achieves 86.1% accuracy (vs 87.7% native) with 91% lower search cost and 68% lower latency. In production e-commerce workload, DSG cuts search cost by 98% while maintaining accuracy.

AI Agents MCP RAG

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

We built an open source UI kit for document RAG/agents

Extend releases an open source UI kit (MIT) for document RAG and agents: 15 components for PDF, DOCX, XLSX viewers with bounding box citations, file upload, e-signature. Built internally, tested on millions of pages/day, actively maintained.

RAG AI Agents Open source

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> DeusData /</span> codebase-memory-mcp

High-performance code intelligence MCP server. Indexes codebases into persistent knowledge graph in milliseconds. Supports 158 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.

MCP Code generation RAG

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> infiniflow /</span> ragflow

RAGFlow is an open-source RAG engine combining retrieval-augmented generation with agent capabilities to create a superior context layer for LLMs.

RAG AI Agents Open source

SIG

HYP

arXiv cs.CL·Jun 17

MLLP-VRAIN UPV system for the IWSLT 2026 Simultaneous Speech Translation task

MLLP-VRAIN group participates in IWSLT 2026 simultaneous speech translation using Parakeet and Qwen 3.5 models. Cascaded system with adaptive policies and RAG mechanism for domain-specific context. +5.82 XCOMET-XL improvement on En→De test set versus previous year.

Qwen RAG Code generation

SIG

HYP

arXiv cs.CL·Jun 17

The Slop Paradox: How Synthetic Standardization Erodes Clinical Uncertainty and Cross-Modal Alignment in AI-Rewritten Radiology Reports

Study of 450 chest X-ray reports showing LLM rewriting for standardization preserves image-text alignment (2.5% degradation) but erodes 26.8–29.3% of clinical entities and 14.9–16.5% of uncertainty language. The paradox: tasks producing 'cleaner' text pull content away from images.

Vision RAG Evals

SIG

HYP

arXiv cs.AI·Jun 17

When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval

An LLM-based self-evolving agent iteratively generates query rewriting rules to enhance BM25 for legal case retrieval. Tested on LeCaRD-v2 (Chinese benchmark), the framework outperforms baselines without parameter training by leveraging automatic evaluation and eliminating ineffective rules.

AI Agents Reasoning Benchmarks

SIG

HYP

arXiv cs.AI·Jun 17

FlowRAG: Synergizing Explicit Reasoning via Frequency-Aware Multi-Granularity Graph Flow

FlowRAG improves graph-based retrieval-augmented generation through a multi-granularity heterogeneous graph (passages, summaries, sentences, entities) and frequency-aware weighted flow module. This enhances semantic recall and explicit reasoning for complex multi-hop tasks.

RAG Reasoning Benchmarks

SIG

HYP

arXiv cs.CL·Jun 17

MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation Evaluation

MODE-RAG is a multi-agent system driven by Variational Free Energy to reduce hallucinations in Multimodal Retrieval-Augmented Generation. It uses Monte Carlo Tree Search, logit perturbations, and specialized agents to route high-risk queries and perform post-hoc factual verification. Authors introduce ModeVent, a challenging subset of MultiVent dataset, to evaluate M-RAG robustness.

RAG Multi-agent Vision

SIG

HYP

arXiv cs.AI·Jun 17

Brick-DICL: Dynamic In-Context Learning for Automated Brick Schema Classification

Brick-DICL introduces a two-stage dynamic in-context learning framework for automated Brick schema classification of BMS points (936 classes). Combines metadata-RAG and class-RAG to enhance LLM domain knowledge, with multi-LLM filtering to reduce manual verification effort.

RAG Prompt engineering Reasoning

SIG

HYP

arXiv cs.AI·Jun 17

DecoSearch: Complexity-Aware Routing and Plan-Level Repair for Text-to-SQL

DecoSearch is a training-free framework for text-to-SQL translation that routes queries by complexity. A schema selector prunes the database, an LLM judger decides if decomposition is needed, and a DAG solves atomic sub-questions. Achieves 70.53% on BIRD and 88.31% on Spider with DeepSeek, outperforming training-free baselines.

Code generation Reasoning RAG

SIG

HYP

arXiv cs.AI·Jun 17

DiagFlowBench: Evaluating How Language Models Handle Off-Procedure Inputs in Grounded Diagnostic Dialogue

DiagFlowBench evaluates how language models handle off-procedure inputs in industrial diagnostic dialogue. A dataset of 1,676 multi-turn conversations derived from 50 diagnostic flowcharts reveals models often select a real but contextually inadequate step rather than hallucinate, exposing a vulnerability: plausible but wrong advice grounded in documentation.

Benchmarks Evals Reasoning

SIG

HYP

Le Big Data·Jun 16

Meta donne un gros coup d’IA à Facebook… en exploitant les publications publiques

Meta integrates AI into Facebook through a new search mode leveraging public posts. The platform promises faster responses to user queries.

Meta AI RAG

SIG

HYP

arXiv cs.AI·Jun 16

CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation

CONCORD is an asynchronous sparse aggregation framework for device-cloud RAG with document isolation. It uses waiting debt control and certificate-guided minimal supplementation to reduce synchronization and data transfer. Improves end-to-end throughput by 1.66× to 2.15× on Natural Questions and WikiText-2 while reducing per-token communication by over 100×.

RAG Papers Infrastructure

SIG

HYP

arXiv cs.AI·Jun 16

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

DR-DCI combines retrieval with Direct Corpus Interaction for agent-based search over large corpora. The system uses a retriever to dynamically populate a local workspace where agents execute precise operations (filtering, comparison, verification). On Browsecomp-Plus, DR-DCI achieves 71.2% accuracy (+8.3 points vs raw DCI) and remains stable up to 10M documents, where raw DCI becomes unstable.

AI Agents RAG Reasoning

SIG

HYP

arXiv cs.AI·Jun 16

Semantics-Enhanced Retrieval-Augmented Time Series Forecasting

SERAF, a time series forecasting framework, combines retrieval of historical segments with self-generated textual descriptions. Multimodal approach tested on 7 real-world datasets to improve predictions beyond numerical similarity alone.

RAG Benchmarks Papers

SIG

HYP

arXiv cs.AI·Jun 16

ChatPlanner: A Large Language Model Framework for Personalized Public Transit Routing

ChatPlanner is a framework using fine-tuned LLMs with RAG to extract user preferences from natural language and integrate them into public transit routing optimization. Evaluated on 8 personas and 5 contexts, the system combines fine-tuning (output structure) and RAG (query-specific context) to identify solutions overlooked by existing planners.

RAG Fine-tuning Prompt engineering

SIG

HYP

arXiv cs.CL·Jun 16

Context Compression Is Not One Thing: Readable Symbolic Re-expression vs. Coherent Summary at Matched Budget

Telegraph English, a readable symbolic format, rewrites retrieved passages into structured entity-relation statements for context compression. On MuSiQue, TwoWiki, and HotpotQA, it outperforms three matched-budget baselines (deletion, truncation, sub-sampling) by 13–20 F1 points, and exceeds coherent prose summaries on the hardest dataset.

RAG Reasoning Benchmarks

SIG

HYP

arXiv cs.CL·Jun 16

ReportQA: QA-Based Radiology Report Evaluation

ReportQA introduces a QA-based evaluation metric for automated radiology report generation. The framework uses LLMs to extract structured information, generate QA pairs from templates, and evaluate alignment with radiologist judgments. Authors release knowledge trees, structured reports, and code for QA construction and evaluation.

Papers Vision Evals

SIG

HYP

arXiv cs.CL·Jun 16

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus

XBCP, a controlled benchmark, evaluates deep research agents' ability to operate across languages. Four agents tested with dense and sparse retrievers across 12 languages show substantial degradation: evidence recall loss, reduced calibration, unreliable citations. Problems persist even when gold evidence is directly supplied.

AI Agents RAG Benchmarks

SIG

HYP

arXiv cs.CL·Jun 16

Few-Shot Biomedical Relation Extraction with Large Language Models: A Viable Alternative to Supervised Learning?

Comparative study of few-shot biomedical relation extraction with LLMs vs supervised learning on BioREDirect. Pairwise classification vs joint generation: F1=0.44 (few-shot) vs 0.56 (supervised) in micro-F1, but 0.45 vs 0.38 in macro-F1. LLMs outperform baseline on rare relations.

Prompt engineering Benchmarks RAG

SIG

HYP

arXiv cs.AI·Jun 16

Hierarchical Modeling of ICD Codes in EHR Foundation Models

Study on integrating ICD-10-CM hierarchy into EHR foundation models. Authors compare two approaches: augmenting BERT sequences with hierarchical tokens and injecting hierarchy into graph-based code representations. Experiments on MIMIC-IV and eICU show explicit hierarchy encoding improves predictions in-domain and in cross-dataset transfer.

Papers Embeddings RAG

SIG

HYP

arXiv cs.AI·Jun 16

Agentic Retrieval and Reinforcement Learned Equation Chains: A Controlled Generation Framework for Complex and Novel Physics Word Problems

ARVRE combines offline reinforcement learning, agentic RAG, and LLMs to generate complex, solvable physics word problems. Stage one builds valid equation chains via temporal-difference learning; stage two converts chains into natural-language questions. Human and automated evaluations show superiority in complexity, novelty, and solvability.

AI Agents RAG Reinforcement learning

SIG

HYP

arXiv cs.CL·Jun 16

Privacy-Preserving Text Sanitization for Distributed Agents Collaboration via Disentangled Representations

DiSan, a privacy-preserving sanitization framework, factorizes text into two subspaces: one preserving task semantics and one containing stylistic signatures. On a distributed multi-agent RAG benchmark, DiSan reduces PII exposure by 20× while maintaining 83% answer faithfulness, and lowers Enron stylometric attribution by 73.2% (TF-IDF) and 70.6% (neural probe).

Multi-agent RAG AI safety

SIG

HYP

arXiv cs.CL·Jun 16

T-Mem: Memory That Anticipates, Not Archives

T-Mem proposes a long-term conversational memory architecture that overcomes lexical and vector similarity bounds. The system introduces write-time triggers to enable two recall modes: descriptive (surface features) and associative (latent semantic arcs). T-Mem achieves state-of-the-art on LoCoMo and LoCoMo-Plus benchmarks.

AI Agents RAG Benchmarks

SIG

HYP

arXiv cs.CL·Jun 16

Encode Errors: Representational Retrieval of In-Context Demonstrations for Multilingual Grammatical Error Correction

Retrieval method for in-context demonstrations using Grammatical Error Representations (GER) for multilingual grammatical error correction. On 8B open-source models, results match GPT-4o-mini and Deepseek2.5. For low-resource languages, F₀.₅ scores improve up to 1.20× over baseline.

RAG Prompt engineering Benchmarks

SIG

HYP

arXiv cs.CL·Jun 16

Transfer Learning for FHIR Questionnaire Terminology Binding

Retrieval study to automatically bind LOINC codes to FHIR Questionnaire items in healthcare. Six methods tested (TF-IDF, MiniLM, BioBERT, BioLORD, contrastive fine-tuning, GPT reranker) on 97,314 codes. BioLORD (encoder pre-trained on biomedical ontologies) achieves R@1=0.185 without task-specific data; contrastive fine-tuning reaches R@5=0.389. GPT augmentation degrades performance.

Embeddings Fine-tuning RAG

SIG

HYP

Reddit r/MachineLearning·Jun 15

Cleo: trying to fit full analyst behavior in a 2B model [P]

Cleo is a Qwen 2B-Base fine-tune designed for text-to-SQL tasks. The model integrates training, evaluation, and inference in a unified system with SQL safety layer, dialect handling, and clarification behavior. Code, model, and datasets are fully open-source.

Qwen Fine-tuning Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

archex: local-first, deterministic code-context for AI agents — no API key, no telemetry (Apache 2.0)

archex converts a repo into ranked, token-budgeted context for AI agents: symbols, imports, dependency graph. Local-first pipeline (BM25F + embeddings + RRF + reranker) with no API key, no telemetry. Benchmarks: recall 0.95 vs 0.32 (cocoindex-code), cold start 0ms vs 4,721ms, 71% fewer tokens.

Code generation RAG AI Agents

SIG

HYP

Reddit r/LocalLLaMA·Jun 15

I made a private on-device LLM app for Android (notes + recall, nothing leaves the phone)

Developer releases Android app running LLM fully on-device for note-taking and AI-powered recall. All data stays on phone, no cloud. Seeking beta testers (8GB+ RAM recommended), free, in Google Play closed testing.

Open source Tools RAG

SIG

HYP

RAG — AI news · Signal IA