Topic

#Papers

Papers are scientific research articles published by labs or universities to present new findings in AI. For example, the paper "Attention Is All You Need" (Google, 2017) introduced the Transformer architecture.

40Articles

3Sources

73Avg. signal

arXiv cs.CL·Jun 18

Learning Robust Pair Confidence for Multimodal Emotion-Cause Pair Extraction

RPCL, a training-only framework for multimodal emotion-cause pair extraction, improves pair-confidence robustness. Using margin constraints and contextual corruption, it increases Pair F1 by 2.58–2.83 points on ECF/MECAD/MEC4 without changing inference.

Papers Benchmarks Vision

SIG

HYP

arXiv cs.CL·Jun 18

Redact or Keep? A Fully Local AI Cascade for Educational Dialogue De-Identification

Local de-identification framework for educational dialogues. Two-stage cascade: union proposer (lightweight encoders + deterministic rules) generates PII candidates, then binary Redact/Keep reviewer uses dialogue context and speaker role. Achieves 0.958 macro F1 on math tutoring transcripts, outperforms commercial API (0.706) and local LLM baseline (0.767), runs on single laptop.

RAG AI safety Papers

SIG

HYP

arXiv cs.CL·Jun 18

LLM Parameters for Math Across Languages: Shared or Separate?

Mechanistic analysis of mathematical reasoning in multilingual LLMs. Math-associated parameters exhibit partial cross-lingual overlap, concentrated in intermediate layers. English produces the largest set of math-relevant parameters, while lower-resource languages reveal smaller parameter sets.

Reasoning Papers Benchmarks

SIG

HYP

arXiv cs.CL·Jun 18

PreUnlearn: Auditing Collateral Knowledge Damage Before Large Language Model Unlearning

Study of collateral damage in LLM machine unlearning. Authors show damage propagates beyond the forget set following semantic distance gradients, and propose PreUnlearn, a pre-unlearning prediction method to audit risks before execution.

AI safety Alignment Papers

SIG

HYP

arXiv cs.CL·Jun 18

Steerable Cultural Preference Optimization of Reward Models

Novel SCPO algorithm for training reward models that balance diverse cultural preferences across subcommunities. Achieves 7-point improvements for minority reward models on PRISM and GlobalOpinionQA (7 countries), with 280% better training data efficiency than full-finetuning.

Alignment Reinforcement learning Evals

SIG

HYP

arXiv cs.CL·Jun 18

Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

Morpheus is a morphology-aware neural tokenizer for agglutinative Turkish. The model uses differentiable Poisson-binomial dynamic programming to segment morphemes with 1.425 bits-per-character compression and MorphScore macro-F1 of 0.61 (vs ~0.32 for subword tokenizers). Lossless by construction: decode(encode(w)) = w.

Embeddings Papers Open source

SIG

HYP

arXiv cs.CL·Jun 18

Output Vector Editing for Memorization Mitigation in Large Language Models

Memorization suppression method in LLMs via output vector editing of MLP neurons. Tested on 4 models (360M-7B parameters), achieves 87.9% suppression on OLMo-7B with 6831 memorized sequences. Complementary approach to existing neuron ablation methods.

AI safety Alignment Papers

SIG

HYP

arXiv cs.CL·Jun 18

RedactionBench

RedactionBench is a manually annotated benchmark of 200 documents across 11 domains for evaluating PII redaction in context. Introduced with R-Score, a character-level metric, it shows 35 models (NER, SLM, frontier models) fail on contextual redactions: human consensus 89.4% for mandatory redactions, 47.7% for contextual ones.

Benchmarks AI safety Evals

SIG

HYP

arXiv cs.CL·Jun 18

Beyond Scalar Scores: Exploring LLM-based Metrics for Clinical Significance Evaluation in Radiology Reports

Study on evaluating AI-generated radiology reports. Researchers show existing LLMs over-penalize harmless rephrasings while detecting clinical errors. They train lightweight metrics on Qwen3-8B and MedGemma-4B outperforming 32B medical models, with dataset and metric release planned.

Benchmarks Evals Papers

SIG

HYP

arXiv cs.CL·Jun 18

ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement

ScholarSum introduces a hierarchical knowledge graph framework for abstractive scientific summarization. The system organizes documents into semantically coherent units, generates an initial draft, then refines it through iterative verification and rewriting to ensure logical coherence and factual faithfulness.

Papers RAG Reasoning

SIG

HYP

arXiv cs.CL·Jun 18

Approximate Structured Diffusion for Sequence Labelling

New approach combining diffusion and CRF for sequence labelling in NLP. Method conditions a CRF on the full label sequence (noisy), bypassing span limitations of standard CRFs. Results: 16.5% error reduction on POS-tagging.

Papers Reasoning Benchmarks

SIG

HYP

arXiv cs.LG·Jun 18

Gaussian Mixture Attention: Linear-Time Sequence Mixing via Probabilistic Latent Routing

Gaussian Mixture Attention (GMA) replaces standard attention with probabilistic routing through K learned Gaussian mixture components. Queries and keys map to responsibility vectors in a shared latent space. GMA avoids explicit N×N matrix materialization, reducing memory complexity to O(NK) instead of O(N²). Competitive on long-context classification, but behind SDPA and Mamba on WikiText-103.

Reasoning Benchmarks Papers

SIG

HYP

arXiv cs.LG·Jun 18

Artemis: Anatomy-Resolved inTervention for Eliminating Multimodal NeuroImage confounderS

Artemis is a causal framework for graph neural networks addressing demographic confounders (age, sex) in multimodal brain imaging (fMRI + DTI). The method applies causal interventions at each brain region independently to learn invariant representations. Tested on ADNI, OASIS, and HCP benchmarks, it improves disease diagnosis and classification tasks.

Papers Reasoning Alignment

SIG

HYP

arXiv cs.LG·Jun 18

Fisher Width: A Geometric Measure of Complexity on Statistical Manifolds

New geometric complexity measure called Fisher width, a Fisher-geometric analogue of Gaussian width on statistical manifolds. Replaces Euclidean geometry with Fisher information metric to capture local statistical curvature. Develops foundational theory with generalization bounds and computable estimators, validated on MNIST.

Papers Benchmarks Evals

SIG

HYP

arXiv cs.LG·Jun 18

SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector

SAGE is a post-hoc method to improve selective unlearning in LLMs. It corrects final update vectors by suppressing components damaging retention, without rerunning the original unlearning pipeline. Tested across multiple methods and scales, SAGE reduces the forget-retain trade-off.

Alignment Papers

SIG

HYP

arXiv cs.LG·Jun 18

Ghost Attractor Networks: Basin-Structured Dynamical Decoders for Closed-Loop Sequential Generation

Ghost Attractor Networks introduce an efficient dynamical decoder for sequential generation in robotics. With 2.3M parameters, it matches the offline accuracy of a 1.07B-parameter Diffusion Transformer (462× fewer parameters, 32× lower latency). On LIBERO-10, phase conditioning improves success rate by 13.5 percentage points over MLP baseline.

Code generation Robotics Reasoning

SIG

HYP

arXiv cs.LG·Jun 18

A Survey on Data-Driven Models for Soil Moisture Regression and Classification

Survey of AI-based models for soil moisture estimation and classification. Five categories compared: statistical time-series, geostatistical methods, classical ML, deep learning, and Bayesian approaches. Data-driven methods provide flexible alternatives to computationally expensive physics-based models.

Benchmarks Papers

SIG

HYP

arXiv cs.LG·Jun 18

Why SWAVE May Not Be All You Need:A Concept-Evolution Retrospective on Complex-Valued Recurrent Language Models

SWave is a complex-valued recurrent language model (169M parameters) trained on FineWeb-Edu. The paper documents its evolution across three phases, identifying structural failures (cos-domination collapse) and validating critical components (ComplexNorm, Wave Propagation Scan). Final PPL: 22.0 at step 89,861.

Papers Reasoning Benchmarks

SIG

HYP

arXiv cs.LG·Jun 18

What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

Study on grokking (delayed transition from memorization to generalization). Authors show weight norm doesn't directly control grokking delay but acts through logit scale. Fixing norm and varying output temperature, they recover 85% of delay by matching logit scale. Effect is loss-dependent (cross-entropy vs MSE). Logit scale and softmax saturation are the proximal variables.

Papers Reasoning Evals

SIG

HYP

arXiv cs.LG·Jun 18

Quantum Annealing Enhanced Reinforcement Learning for Accurate Remaining Useful Lifetime Prediction

QAQL framework couples quantum annealing with Q-learning for remaining useful life (RUL) prediction in predictive maintenance. Each Q-value update encoded as QUBO solved on D-Wave Advantage system. Validated on NASA C-MAPSS and fleet maintenance datasets: statistically significant improvements over classical and quantum baselines.

Reinforcement learning Benchmarks Papers

SIG

HYP

arXiv cs.AI·Jun 18

What Must Generalist Agents Remember?

Theoretical paper on memory requirements for generalist agents. Proves that agents performing near-optimally across multiple domains must maintain distinct memory distributions at observational bottlenecks. Memory enables domain disambiguation, transition-model reconstruction, and planning.

AI Agents Reasoning Papers

SIG

HYP

arXiv cs.AI·Jun 18

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

WorldLines is a long-horizon embodied agent benchmark testing memory in dynamic household environments. The dataset includes temporally extended traces with dialogues, actions, and object/device state changes. ObsMem, an observer-grounded memory framework, maintains visibility-aware memories and action-native state trails for state-informed decisions.

AI Agents Benchmarks Reasoning

SIG

HYP

arXiv cs.AI·Jun 18

Generative-Model Predictive Planning for Navigation in Partially Observable Environments

BeliefDiffusion combines diffusion models and Model Predictive Control for navigation in partially observable environments. The framework generates multimodal belief distributions and plans efficient navigation strategies. Experiments on synthetic maps: outperforms RL and other generative approaches in success rate and path efficiency.

Reasoning Reinforcement learning Papers

SIG

HYP

arXiv cs.AI·Jun 18

ThinkDeception: A Progressive Reinforcement Learning Framework for Interpretable Multimodal Deception Detection

ThinkDeception introduces a progressive reinforcement learning framework for interpretable multimodal deception detection. Using MLLMs, it converts binary classification into explicit reasoning via Chain of Thought. VAC-GRPO with curriculum learning stratified into 4 difficulty tiers achieves SOTA on mainstream benchmarks.

Reasoning Reinforcement learning Vision

SIG

HYP

arXiv cs.AI·Jun 18

Analysing drivers and interdependencies in European electricity markets using XAI

Study combining deep neural networks with XAI (SHAP, SSHAP) to analyse 39 European electricity bidding zones. Identifies solar energy as disproportionate price driver, gas prices as dominant factor, and interconnections revealing interdependence of electricity markets.

Evals Papers

SIG

HYP

arXiv cs.AI·Jun 18

Human-AI Coevolution Dynamics: A Formal Theory of Social Intelligence Emergence Through Long-Term Interaction

New formal theory (HACD-H) modeling emergence of social intelligence in long-term human-AI interaction. Unified framework integrating emotional adaptation, social memory, and personality consistency. Study on 14,700 conversation turns reveals negative correlation between social intelligence and social cognitive energy (r=-0.391, p<0.001), with developmental phase-transition patterns.

Reasoning AI Agents Papers

SIG

HYP

arXiv cs.AI·Jun 18

Beyond Safe Data: Pretraining-Stage Alignment with Regular Safety Reflection

Safety Reflection Pretraining inserts short safety reflections into pretraining corpora to establish self-monitoring directly in language modeling. On 1.7B models pretrained on FineWeb-Edu, the method improves safety classification accuracy and substantially reduces success rates of inference-stage and finetuning attacks.

AI safety Alignment Reinforcement learning

SIG

HYP

arXiv cs.AI·Jun 18

NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning

NeSyCat Torch unifies neurosymbolic semantics (classical, fuzzy, probabilistic, neural) under a single truth definition parametrized by monads. Implemented in PyTorch, JAX, and HaskTorch, the framework interprets computational symbols via neural networks. On MNIST addition, outperforms LTN and DeepProbLog in speed and accuracy.

Reasoning Reinforcement learning Papers

SIG

HYP

arXiv cs.CL·Jun 18

CoreMem: Riemannian Retrieval and Fisher-Guided Distillation for Long-Term Memory in Dialogue Agents

CoreMem introduces a memory architecture for personalized dialogue agents on edge devices (8 GB VRAM). Replaces cosine similarity with Fisher-Rao metric for retrieval and uses Fisher-guided token distillation for compression. Achieves +4.51 pp gains in open-domain reasoning and +4.17 pp in temporal reasoning on LOCOMO and LongMemEval-S benchmarks.

AI Agents RAG Embeddings

SIG

HYP

arXiv cs.CL·Jun 18

Speech-Driven End-to-End Language Discrimination towards Chinese Dialects

Paper presents speech-driven approach for Chinese dialect discrimination. Combines MFCC features, HMM-DNN speech recognition model, attention mechanism and CNN. Evaluation on two benchmark Chinese dialect corpora shows improvement over state-of-the-art methods.

Voice Benchmarks Papers

SIG

HYP

arXiv cs.CL·Jun 18

RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

RegMix-D extends RegMix by leveraging full loss trajectories from proxy runs, not just endpoint losses, to predict optimal data mixtures at multiple training stages. Tested on 25B tokens of Pile with a 1B model, RegMix-D outperforms RegMix and DoReMi across 13 downstream tasks while using 75% less proxy compute.

Benchmarks Papers

SIG

HYP

arXiv cs.CL·Jun 18

LLMs Struggle to Measure What Distinguishes Students of Different Proficiency Levels: A Study of Item Discrimination in Reading Comprehension Assessment

Study evaluating 42 LLMs (proprietary and open-source) on their ability to measure item discrimination in reading comprehension. Models fail: Spearman correlation of 0.152 in direct prediction, 0.241 in CTT calibration. LLMs do not reliably capture how assessment items distinguish students of different proficiency levels.

Benchmarks Evals Papers

SIG

HYP

arXiv cs.CL·Jun 18

Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining

ImpSH, a triplet-based framework, improves implicit hate speech detection by aligning posts with implied statements and using context-bounded semi-hard negatives. Evaluated on IHC, SBIC, and DynaHate with BERT and HateBERT, it enhances cross-domain performance and provides more stable representations than standard supervised contrastive approaches.

Benchmarks AI safety Papers

SIG

HYP

arXiv cs.CL·Jun 18

Efficient Financial Language Understanding via Distillation with Synthetic Data

Distillation framework with synthetic data for financial sentiment analysis. Knowledge transfer from large instruction-tuned teacher to compact student models. Clustering-based seed selection generates synthetic examples via few-shot prompting. Compact model outperforms teacher on complex/noisy text with minimal supervision.

Fine-tuning RAG Prompt engineering

SIG

HYP

arXiv cs.LG·Jun 18

CODEBLOCK: Learning to Supervise Code at the Right Granularity

CodeBlock is a structure-aware sparse supervision framework for code LLM fine-tuning. It selects syntactically coherent code blocks rather than isolated tokens, estimating utility via generalized cross-entropy and data-flow signals. On 6 code-generation benchmarks, CodeBlock outperforms full-token SFT while using only 1.9% of supervised response tokens.

Code generation Fine-tuning Papers

SIG

HYP

arXiv cs.LG·Jun 18

A Link between Shock-wave Theory and Symmetry-reduced Stochastic Gradient Descent for Artificial Neural Networks

Mathematical link established between shock-wave theory and symmetry-quotiented stochastic gradient descent dynamics for neural networks. After quotienting parameter symmetries and entropy coarse-graining, effective dynamics satisfy a viscous Hamilton-Jacobi equation. Applied to MLPs, CNNs, Transformers, and mean-field networks.

Papers Reasoning Reinforcement learning

SIG

HYP

arXiv cs.LG·Jun 18

DRIFT: Refining Instruction Data via On-Policy Data Attribution

DRIFT refines SFT training data distribution using on-policy Influence Functions. The method uses model rollouts as validation targets to minimize proximity gap and debias gradient norm bias. Experiments on 7B instruction and reasoning models show consistent performance ceiling improvements over existing curation baselines.

Fine-tuning Reinforcement learning Evals

SIG

HYP

arXiv cs.LG·Jun 18

Neural Network Implementation of the Renormalization Group for Fault Diagnosis with Class Imbalance

RGNet, a neural network architecture based on the renormalization group, addresses class imbalance and multidimensional noise for fault diagnosis. The model hierarchically compresses feature space and captures both local details and global patterns. Tested on imbalanced AI4I dataset.

Papers Evals Benchmarks

SIG

HYP

arXiv cs.LG·Jun 18

ThousandWorlds: A benchmark for climate emulation of potentially habitable exoplanets

ThousandWorlds is an ML benchmark for climate emulation of potentially habitable exoplanets. The dataset contains ~1800 simulations from 5 global climate models mapping 8 planetary parameters to 3D atmospheric fields. Three nested subsets and two evaluation protocols test 7 baselines; GP-based methods outperform standard deep learning.

Benchmarks Papers Reasoning

SIG

HYP

arXiv cs.LG·Jun 18

Task-Restricted Symmetries in Recurrent Weight Space

Study of functional redundancy in single-layer tanh RNNs using ordered real Schur coordinates. Authors identify nonnormal couplings removable with minimal loss on specific tasks (copy, flip-flop, sine generation), revealing task-dependent approximate functional invariances rather than universal weight-space symmetries.

Papers Reasoning

SIG

HYP