Page 54 of 146

AllHigh signalRecent
5828 articles
arXiv cs.CL·

GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models

GPF-LiveNews is a streaming evaluation protocol to audit how LLMs frame emerging news events for different audiences. Tested on 23 models across 12 monitoring runs, it measures semantic and sentiment variations across 42 identity labels. Results show Policy/Action prompts produce strongest semantic movement, while sentiment variation remains flat across dimensions.

EvalsAI safetyAlignment
SIG
72
HYP
18
arXiv cs.CL·

Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning

Thoughts-as-Planning formalizes reasoning chain optimization as sequential decision-making over latent semantic space. The framework learns a latent world model simulating effects of reasoning chain edits on outputs, supporting multi-scale edits (token, segment, instruction) via gradient descent or reinforcement learning planning.

ReasoningReinforcement learningPrompt engineering
SIG
72
HYP
28
arXiv cs.CL·

Assessing Dutch Syllabification Algorithms and Improving Accuracy by Combining Phonetic and Orthographic Information through Deep Learning

Comparative assessment of four Dutch syllabification algorithms (Brandt Corstius, Liang, Trogkanis-Elkan CRF, and a novel deep learning model). The deep learning model combining phonetic and orthographic information achieves 99.65% word accuracy (+0.14% improvement over literature). Data-driven algorithms outperform knowledge-based approaches.

PapersBenchmarksCode generation
SIG
72
HYP
15
arXiv cs.CL·

A Modular Architecture for Typologically Controlled Lexicon Generation

Modular framework for generating pronounceable, typologically plausible artificial lexicons. Samples phoneme inventories from PHOIBLE, applies three phonological grammars (deterministic, OT, MaxEnt), and assigns meanings via Swadesh-Leipzig-Jakarta ontology. Evaluation on character n-gram perplexity and KL divergence: probabilistic grammars outperform baselines on 100-5,000 word forms.

PapersBenchmarks
SIG
72
HYP
15
arXiv cs.LG·

Designing Active Tether-Net Systems for Space Debris Capture with Graph-Learning-Aided Mixed-Combinatorial Optimization

Active tether-net system for space debris capture using Graph Neural Network (GNN) to jointly optimize net morphology, thruster masses of maneuverable units, and controller aiming points. GNN reduces mixed combinatorial nonlinear programming (MCNLP) to nonlinear programming (NLP) solved via Particle Swarm Optimization with gradient-based refinement, achieving faster convergence than direct MCNLP solving.

PapersReasoning
SIG
72
HYP
15
arXiv cs.LG·

Causal Intelligence for Constraint-Aware Intervention Design to Induce State Transitions

COAST is a causal-intelligence approach for designing constrained interventions that induce state transitions. The system learns context-specific causal graphs, attributes distributional shifts to mechanism-level causal drivers, and uses multi-objective optimization balancing transition efficacy, intervention complexity, and target-state stability. Validated on synthetic benchmarks and real biological datasets.

ReasoningBenchmarks
SIG
72
HYP
18
arXiv cs.LG·

LoRe: Adaptive Interaction-Evaluation Routing with Per-Step Interaction Budgets for Iterative Graph Solvers

LoRe is a training-free inference-time wrapper optimizing diffusion-based neural solvers for combinatorial optimization. It enforces per-step interaction-evaluation budgeting, dynamically routing computation to high-conflict/high-uncertainty interactions. On MIS and TSP, LoRe achieves ×8 speedup, ×12 peak-memory reduction (MIS) and ×15 speedup, ×44 memory reduction (TSP n=1000).

ReasoningBenchmarksPapers
SIG
72
HYP
18
arXiv cs.LG·

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

Study on LLM reward design failures in sparse structured RL. Authors identify two dominant failure modes (reward flooding, semantic misunderstanding) and propose diagnostic-driven iterative refinement. On MiniGrid, DoorKey-8x8 improves from 2.3% to 97.6% success; KeyCorridor from 31.2% to 86.7%. Failure-mode taxonomy is the primary mechanism.

Reinforcement learningLlamaPrompt engineering
SIG
72
HYP
18
arXiv cs.AI·

Tailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model Compatibility

New Data-Model Compatibility (DMC) metric to assess dataset suitability for reasoning distillation to smaller models. DMC jointly considers data quality, relative difficulty, and student model capability. Validation across multiple student models and tasks shows strong correlation with distillation performance and improvements via dynamic dataset selection during training.

ReasoningFine-tuningBenchmarks
SIG
72
HYP
18
arXiv cs.LG·

TaxDistill: Improving Metagenomic Taxonomic Annotation via Distilled Genomic Foundation Models

TaxDistill applies knowledge distillation to improve metagenomic taxonomic annotation. GenomeOcean, a 500M-parameter genomic foundation model, generates soft labels to train a lightweight student network, reducing noise from initial retrieval tools. On 7 CAMI2 datasets, TaxDistill improves MMseqs2's F1 score from 0.763 to 0.941 on the Gastrointestinal dataset.

PapersFine-tuningBenchmarks
SIG
72
HYP
18
arXiv cs.CL·

Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text

eXTC combines structured prompt optimization and reinforcement learning for text classification. The system learns a natural language rulebook first, then distills reasoning from a teacher LLM into a compact model, then expands capabilities via RL. Result: fast inference with local reasoning traces and global modular explanations of learned domain rules.

Prompt engineeringReinforcement learningReasoning
SIG
72
HYP
28
arXiv cs.AI·

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Redpanda introduces an Agentic Data Plane architecture using out-of-band metadata channels to enforce security policies, data classifications, and behavioral constraints outside the agent's read/write path. These channels prevent hallucinations and adversarial manipulation while maintaining tamper-proof audit trails. Demonstrated with a multi-agent portfolio rebalancing system.

AI AgentsMulti-agentAI safety
SIG
72
HYP
28
arXiv cs.AI·

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

The Cognitive Categorical Transformer (CCT), a 306M-parameter model augmenting GPT-2 Small, incorporates category-theoretic and cognitive-science-inspired components. On WikiText-103, CCT achieves 21.27 validation perplexity versus 24.19 for GPT-2 Small baseline, a 12% relative reduction (2.92 PPL). Ablations show simplicial message passing accounts for 84% of the improvement.

GPTPapersBenchmarks
SIG
72
HYP
25