Archives

June 2026

449 articles

GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> chopratejas /</span> headroom

Headroom compresses tool outputs, logs, files, and RAG chunks before sending to LLM. Reduces token consumption by 60-95% without degrading answers. Available as library, proxy, and MCP server.

RAGMCPTools
SIG
72
HYP
25
Reddit r/MachineLearning·

Backpropagation destroys V1 brain alignment in one epoch, tracking RSA alignment to fMRI across training for BP, FA, predictive coding, and STDP [R]

Comparative study of learning rules (backprop, feedback alignment, predictive coding, STDP) via RSA alignment with human V1 fMRI. Backprop destroys 90% of V1 alignment after 1 epoch (r: 0.102→0.011), while PC and STDP lose only 25-31%. At epoch 40: PC/STDP >> BP/FA. Suggests fundamental trade-off between global error signals (higher layers) and early-layer alignment.

AlignmentBenchmarksPapers
SIG
78
HYP
15
Reddit r/LocalLLaMA·

Building a free, offline LLM “tutor” grounded in one university textbook — RAG, LoRA, or both? Sanity check wanted

Developer seeks to build a free offline AI tutor grounded in a university textbook. Proposed architecture: RAG as core component (chunking, embedding, retrieval with page/section citations) + optional LoRA for pedagogical style. Questions on model selection (Qwen, Gemma), handling complex structures (figures, equations), and packaging for non-technical users.

RAGFine-tuningOpen source
SIG
35
HYP
15
Reddit r/MachineLearning·

LLM agents patch security bugs, pass all tests, but still leave the vulnerability open [R]

CVE-Bench evaluates 5 frontier models on 20 real-world CVEs (Pillow, GitPython, urllib3, etc.) across 300 runs. Max solve rate 50% (60% under advisory). Agents patch syntactically but leave vulnerabilities open. Significant cross-family gaps (OpenAI vs Laguna, p<0.05), within-family noise. Failure modes: wrong-search drift, hallucinations, context loss.

AI AgentsBenchmarksAI safety
SIG
78
HYP
15
arXiv cs.LG·

Beyond Augmentation: Score-Guided Pathological Prior for EEG-based Depression Detection

Novel approach for Major Depressive Disorder detection from EEG without data augmentation. SGC (Score-Guided Classification) uses an unsupervised generative network to model pathological anomalies as prior, fused with deep feature representations. Cross-Channel Spatial Adaptation module handles multi-center channel heterogeneity. Validated on Mumtaz2016 and MODMA datasets.

PapersEvalsVision
SIG
72
HYP
28
arXiv cs.LG·

RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting

RAFT is a two-stage domain fine-tuning method that mitigates catastrophic forgetting. It refines data via self-conditioned rewriting and answer fusion, then applies on-policy distillation where the original model provides soft targets on student-generated trajectories. Across five domains, RAFT improves domain accuracy by 23.2% over standard SFT and recovers 18.2% of degradation on MS-Bench.

Fine-tuningReinforcement learningPapers
SIG
78
HYP
15
arXiv cs.CL·

Graph-Augmented Retrieval for Cross-Entity Financial Sentiment Analysis: A Comparative Study

Comparative study of a two-hop Graph-RAG architecture versus standard vector-only RAG for cross-entity financial sentiment analysis. On 100 queries (30 direct, 70 relational), Graph-RAG improves entity recall (+6.4%, p<0.001) and answer relevancy for complex queries (+11.7%), with no quality degradation, modest 22.6% latency increase but 80% variance reduction.

RAGBenchmarksPapers
SIG
78
HYP
15
arXiv cs.AI·

MindZero: Learning Online Mental Reasoning With Zero Annotations

MindZero is a self-supervised reinforcement learning framework training multimodal LLMs to infer human mental states without annotations. The model is rewarded for generating mental state hypotheses that maximize the likelihood of observed actions. After training, inference becomes fast single-pass and outperforms model-based methods in both accuracy and efficiency.

ReasoningReinforcement learningAI Agents
SIG
72
HYP
25
arXiv cs.AI·

TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

TAPS introduces a target-aware prefix selection method for diffusion-drafted speculative decoding. By converting diffusion marginals into path-conditioned acceptance estimates, TAPS selects a compact prefix-closed subtree under fixed verification budget. Results: 7.9x lossless speedup vs vanilla autoregressive decoding, 1.36x and 1.74x over DFlash and DDTree.

Code generationReasoningBenchmarks
SIG
78
HYP
15
arXiv cs.CL·

Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models

Study on preventing catastrophic forgetting during continual pretraining of multilingual language models. Authors propose five parameter alignment strategies (layer freezing, regularization, post-hoc reversion, model merging) tested across 32 languages and four evaluation axes. Parameter alignment substantially reduces forgetting while maintaining language acquisition.

Fine-tuningPapersBenchmarks
SIG
78
HYP
15
arXiv cs.AI·

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

Decoder-only models hit an information-theoretic limit in deterministic state-tracking tasks beyond ~25 steps. An Attention Bottleneck Theorem bounds capacity to O(H·log(L/H)·√dh). Across 12 models and 8 domains (SWE-Bench, WebArena, SQL), tool delegation achieves 86-94% vs 24-42% for pure neural reasoning. Fine-tuning improves <5%, confirming an architectural ceiling.

ReasoningAI AgentsBenchmarks
SIG
78
HYP
25
arXiv cs.AI·

Threshold-Based Exclusive Batching for LLM Inference

arXiv paper on LLM inference batching optimization. Authors demonstrate mixed batching (MB) is suboptimal on bandwidth-constrained GPUs: exclusive batching (EB) achieves 41.9% higher throughput on RTX PRO 6000 (1.792 TB/s). They propose EB+, a hybrid scheduler that dynamically switches between EB and MB based on GPU bandwidth, model size, and workload composition, reaching 36.4% gains under non-stationary traffic.

InfrastructureBenchmarksPapers
SIG
78
HYP
15
arXiv cs.AI·

Closed-Loop Neural Activation Control in Vision-Language-Action Models

CTRL-STEER introduces a closed-loop control framework for Vision-Language-Action (VLA) models. Instead of fixed steering coefficients, it adaptively adjusts intervention strength over time using PID or reinforcement learning controllers. Experiments on OpenVLA with LIBERO task suites demonstrate improved concept regulation stability and better steering-task success trade-offs without retraining the base model.

VisionAI AgentsReinforcement learning
SIG
72
HYP
18
arXiv cs.AI·

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Survey paper proposing Intelligent Computing Architecture Model (ICAM), a six-layer framework for model-native computing. Maps classical computer architecture concepts to LLM systems (cache management, context, agents). Introduces three design laws: Semantic Locality Law, Context Budget Law, Agent Speedup Law. Distinguishes probabilistic execution plane from deterministic control plane.

AI AgentsMulti-agentReasoning
SIG
72
HYP
25
arXiv cs.CL·

BOUTEF: A Multilingual Corpus for FakeNews in North Africa -- Language as a Weapon

BOUTEF is a multilingual corpus from 2 countries (Algeria, Tunisia) covering fake news, authentic narratives, comments, and debunking. Includes MSA, Algerian/Tunisian dialects, Arabizi, French, English, and code-switching. Analysis shows fake news relies on emotionally charged narratives and sensational framing, while debunking adopts a factual, verification-oriented style.

PapersBenchmarksAI safety
SIG
72
HYP
18
arXiv cs.AI·

TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation

TIGER is an inference-time framework to mitigate hallucinations in multimodal generation. It independently extracts an observation graph from input and a claim graph from output, then assigns risk scores to claims based on support and conflict. The model repairs high-risk claims while keeping the backbone frozen. Convergence analysis shows geometric risk reduction to an explicit asymptotic bound.

ReasoningVisionPapers
SIG
78
HYP
22
arXiv cs.CL·

AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection

AEyeDE introduces an attention-based attribution framework for detecting AI-generated text using attention matrices from a proxy Transformer model. A lightweight CNN learns discriminative representations from these attribution maps. The method outperforms text-only baselines, shows strong generator-specific detection, and demonstrates robustness under cross-dataset transfer and spelling perturbations.

PapersAI safetyEvals
SIG
72
HYP
18
arXiv cs.CL·

TCAR-Gen: Temporal Graph Retrieval with Evidence Fusion for Knowledge-Grounded Generation

TCAR-Gen combines query-conditioned graph neural networks, temporal evidence fusion, and chain-of-trees reasoning for retrieval-augmented generation. Achieves 0.3738 Recall@5 on Victorian Crime Diaries benchmark, outperforming Vanilla RAG, Temporal RAG, and GraphRAG variants. Cross-model evaluation across GPT-OSS 20B to TinyLlama 1.1B shows robust retrieval coverage at smaller scales.

RAGReasoningBenchmarks
SIG
72
HYP
18
arXiv cs.CL·

Which Institutional Frameworks Do Chatbots Assume? Auditing Jurisdictional Defaults in Multilingual LLMs

Audit of 7 LLMs (US/China) on 2,520 responses to 60 legal-administrative prompts in English and Mandarin. Models default to the institutional framework of input language: 74.5% of English responses adopt US framework, 53.3% of Chinese responses adopt China framework. Risk of jurisdictional misselection when preferred language differs from applicable jurisdiction.

BenchmarksAI safetyRegulation
SIG
78
HYP
15
arXiv cs.LG·

AI-Guided Design and Optimization of Graphite-Based Anodes via Iterative Experimental Feedback

Iterative AI workflow optimizes graphite-based anodes through sequential learning and experimental feedback loops. Citrine Platform generates surrogate models and refines manufacturing constraints. Results: fabrication reliability improved from frequent failures to 100% success, cells ≥350 mAh/g increased from 28.4% to 84.8%, capacity retention rose from 42.1% to 97.3%.

Reinforcement learningBenchmarksTools
SIG
75
HYP
15
arXiv cs.LG·

ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate

ARCA introduces a token-level credit assignment method for LLM reinforcement learning that addresses degeneracy of intrinsic signals (surprisal, entropy reduction, policy divergence) under LoRA. It measures adapter salience directly via L2 norm of hidden-state residuals instead of output-distribution shifts. Tested on MATH/Qwen3-1.7B with GRPO, ARCA avoids pathological weight concentration.

Reinforcement learningFine-tuningReasoning
SIG
75
HYP
15
arXiv cs.CL·

Enhancing BiGRU with a KAN Block for Legal Document Classification and Summarization

BiGRU architecture enhanced with KAN (Kolmogorov-Arnold Network) block for legal document classification and summarization in low-resource multilingual setup. Evaluation on Bengali/English/transliterated corpus from Bangladesh: 67.96% accuracy in classification (F1=0.65), ROUGE-1/2/L scores of 0.38/0.23/0.31 in summarization. Ablation study shows KAN improves classification from 57.34% to 67.96%.

BenchmarksFine-tuning
SIG
45
HYP
25
arXiv cs.AI·

Product-Aware Deep Autoencoders for Robust Process Monitoring in Multi-Product Cyber-Physical Systems

Academic paper proposing product-aware autoencoders for anomaly detection in multi-product cyber-physical systems. Traditional global models create blind spots where attacks can evade detection. Tests on Tennessee Eastman Process benchmark: product-aware model achieves 100% detection accuracy versus 22.2% for global baseline in attack scenarios.

BenchmarksAI safetyEvals
SIG
72
HYP
15
arXiv cs.CL·

SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering

SPADER is an RL framework for tool-augmented LLM agents in Multi-Answer QA. It introduces Step-wise Peer Advantage (SPA) for fine-grained credit assignment over long trajectories, and a diversity-aware exploration reward promoting rare entity discovery. Evaluated on QAMPARI, Mintaka, WebQSP, QUEST: improves recall and F1 vs prompting and supervised RL baselines.

AI AgentsReinforcement learningReasoning
SIG
78
HYP
18
arXiv cs.CL·

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

arXiv study on LLM adaptation limits for annotation tasks. Toxicity detection experiments across diverse datasets show 66% of zero-shot errors resist correction via prompting (rescue rate 34.8%). Models follow misaligned definitions while maintaining confidence. Definition-Specific Familiarity (DSF) metric correlates with performance (r=+0.41), outperforming memorization metrics.

Prompt engineeringEvalsBenchmarks
SIG
78
HYP
15
arXiv cs.LG·

Adversarially Robust Control of Conditional Value-at-Risk via Rockafellar-Uryasev Conformal Inference

Online, distribution-free framework for controlling Conditional Value-at-Risk (CVaR) in non-stationary and adversarial environments. Combines conformal tail risk control, online learning, and Rockafellar-Uryasev variational representation. Provable safety guarantees for nonlinear tail risk under arbitrary data-generating processes. Applications: portfolio risk management and LLM toxicity mitigation.

PapersAI safetyReasoning
SIG
72
HYP
15