Page 64 of 147

AllHigh signalRecent

5873 articles

Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction

Framework for processing long documents via parallel chunking and evidence-anchored consolidation. Reduces omission error by 84%, increases evidence traceability by 130%, decreases unsupported claims by 91%. Smaller models benefit most.

Reasoning Benchmarks Papers

SIG

HYP

arXiv cs.LG·May 21

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

Repeating a smaller dataset during training accelerates learning compared to using a larger dataset, via sampling biases that enable favorable layer-wise growth. Effect observed across algorithmic tasks, architectures and optimizers. Authors provide theoretical analysis and empirical interventions.

Papers Reasoning Reinforcement learning

SIG

HYP

arXiv cs.LG·May 21

Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification

New approach for node classification in partially labeled graphs. Authors propose Transductive Sharpening (TS), a loss-level modification that minimizes prediction entropy on unlabeled nodes while counterbalancing effects on labeled nodes. Consistent improvements across multiple benchmarks without architectural changes.

Benchmarks Papers

SIG

HYP

arXiv cs.LG·May 21

Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models

Neural framework to estimate pairwise conditional mutual information directly from hidden states of masked diffusion models (MDMs). The estimator captures the model's internal dependency structure and enables MI-guided parallel decoding, reducing inference forward passes by 3-5x on Sudoku and protein sequence generation (ESM-C) while preserving quality.

Reasoning Code generation Papers

SIG

HYP

arXiv cs.AI·May 21

Generative Recursive Reasoning

GRAM (Generative Recursive reAsoning Models) extends recursive reasoning models by replacing deterministic latent trajectories with probabilistic multi-trajectory computation. Trained with amortized variational inference, GRAM outperforms recurrent and recursive baselines on structured reasoning and multi-solution constraint satisfaction tasks.

Reasoning Papers

SIG

HYP

arXiv cs.LG·May 21

It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

SELFCI is a complementary self-distillation framework optimizing two independent reverse KL divergences to align LLMs with Contextual Integrity (CI). The system preserves task-relevant information while minimizing inappropriate disclosures, without costly external supervision, outperforming GRPO and other baselines.

Reinforcement learning Alignment AI safety

SIG

HYP

arXiv cs.CL·May 21

Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks

Study on collocational bootstrapping: mechanism where regularities in word co-occurrence patterns provide cues to syntactic dependencies. Neural networks trained on synthetic datasets varying in subject-verb pairing predictability. Results suggest this mechanism could explain subject-verb agreement acquisition in children.

Papers Reasoning Benchmarks

SIG

HYP

arXiv cs.CL·May 21

Stage-Audit: Auditable Source-Frontier Discovery for Cross-Wiki Tables

Stage-Audit detects hallucinations in LLM-curated tables by enforcing curator-auditor separation and row-level source verification. On 51 Seed2Frontier instances, precision improves from 0.356 to 0.505 (+42%) and F1 from 0.334 to 0.451 (+35%), with explicit per-row source traceability.

Papers RAG Evals

SIG

HYP

arXiv cs.CL·May 21

When Reasoning Supervision Hurts: TTCW-Based Long-Form Literary Review Generation

Study on generating long-form literary reviews based on Torrance Test of Creative Writing (TTCW). Dataset of 263,911 stories annotated across 14 creativity dimensions. Fine-tuning Qwen3 (4B and 8B) shows non-reasoning supervision achieves better performance (0.6820), while reasoning-supervised models fail to complete the required 14-metric review format.

Qwen Fine-tuning Reasoning

SIG

HYP

arXiv cs.CL·May 21

Interpretable Discriminative Text Representations via Agreement and Label Disentanglement

LFD (LLM-assisted Feature Discovery) method generates interpretable text representations via inter-annotator agreement (Cohen's κ) and label disentanglement. Validated on 10 text classification tasks: features clearer and less label-entangled than bottleneck baseline, confirmed by human audit (232 raters).

Evals Papers

SIG

HYP

arXiv cs.CL·May 21

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

ProxyCoT, a chain-of-thought fine-tuning method, improves reasoning on long contexts (up to 10M tokens) by transferring reasoning capabilities from short proxy contexts to full contexts via RL/distillation then supervised fine-tuning. Performance gains with reduced computational overhead and cross-domain generalization.

Reasoning Fine-tuning Reinforcement learning

SIG

HYP

arXiv cs.LG·May 21

Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers

Plug-and-play framework to convert Transformer nonlinear operators into spiking neural network (SNN) compatible operations. Decomposes Softmax, SiLU, and normalization into primitives (division, exponentiation, L2 norms) executable by LIF neuron groups without fine-tuning. <1% accuracy drop on LLM benchmarks.

Reasoning Benchmarks Papers

SIG

HYP

arXiv cs.LG·May 21

Closed-form predictive coding via hierarchical Gaussian filters

Novel predictive coding approach via hierarchical Gaussian filters. Authors restore precision-weighted message passing, enabling simultaneous learning of activations, weights, and precisions without global error signals. On FashionMNIST, the method converges faster than backpropagation while retaining biological advantages of predictive coding.

Reasoning Alignment Papers

SIG

HYP

arXiv cs.CL·May 21

DEL: Digit Entropy Loss for Numerical Learning of Large Language Models

DEL (Digit Entropy Loss) is a novel loss function to improve numerical prediction in LLMs. Tested on CodeLlama, Mistral, DeepSeek, and Qwen-2.5 across 7 mathematical reasoning benchmarks, it outperforms existing methods (MLE, Number Token Loss) by optimizing digit entropy in a supervised manner and generalizing to floating-point numbers.

Papers Benchmarks Fine-tuning

SIG

HYP

arXiv cs.LG·May 21

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

Theoretical work on piecewise-stationary low-rank linear contextual bandits with drifting subspaces. Introduces SPSC algorithm combining isotropic probes with windowed projected ridge-UCB, achieving dynamic regret Õ(r√T) instead of Õ(d√T). Characterizes identification boundary for moving subspace recovery and validates on 11 benchmarks (synthetic, MovieLens, clinical, ZOZOTOWN production logs).

Reinforcement learning Papers Benchmarks

SIG

HYP

arXiv cs.CL·May 21

Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs

arXiv study shows LLMs over-idealize experiences of people with disabilities in social media content generation, producing unrealistic positive stereotypes. Comparative analysis reveals negative bias: certain topics (career, entertainment) overrepresented for nondisabled individuals, reinforcing exclusionary narratives.

Alignment AI safety Benchmarks

SIG

HYP

arXiv cs.CL·May 21

When Irregularity Helps: A Subclass Analysis of Inductive Bias in Neural Morphology

Study on inductive biases in neural morphological generation. Analysis of Japanese past-tense verb inflection reveals a rare irregular subclass (<1% of data) accounts for disproportionate error concentration. Controlled ablations show removing this subclass improves generalization more than eliminating all irregular verbs.

Papers Evals Benchmarks

SIG

HYP

arXiv cs.CL·May 21

Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

Study of synchronization in full-duplex dialogue models (Moshi) that listen and speak simultaneously. Researchers measure internal representation alignment via CKA and detect anticipatory turn-taking cues. Synchronization is strong without noise, degrades with noise, and internal states encode predictive information for turn-taking.

Voice AI Agents Papers

SIG

HYP

arXiv cs.CL·May 21

Findings of the Counter Turing Test: AI-Generated Text Detection

Counter Turing Test evaluates AI-generated text detection techniques. Task A (binary classification) achieves F1=1.0 to distinguish human vs AI text. Task B (model attribution) reaches 0.9531 for identifying GPT-4, Claude 3.5, Llama. Top approaches combine DeBERTa, BART, fine-tuning, and ensemble learning.

Benchmarks GPT Claude

SIG

HYP

arXiv cs.LG·May 21

Physics-informed convolutional neural networks for fluid flow through porous media

CNN encoder-decoder framework predicts pore-scale velocity fields in porous media directly from geometry. Custom loss function enforces velocity reconstruction, incompressibility, no-flow conditions, and physical constraints. Tested on out-of-distribution geometries and accelerates Lattice-Boltzmann simulations (90% of cases).

Papers Benchmarks Vision

SIG

HYP

arXiv cs.LG·May 21

FBOS-RL: Feedback-Driven Bi-Objective Synergistic Reinforcement Learning

FBOS-RL introduces a feedback-driven bi-objective reinforcement learning framework to improve large-scale model training. The framework combines two mutually reinforcing objectives: Exploitation-oriented Policy Alignment (EPA) and Exploration-oriented Capability Cultivation (ECC). Experiments show FBOS-RL converges faster than GRPO with higher performance ceilings.

Reinforcement learning Reasoning Papers

SIG

HYP

arXiv cs.CL·May 21

Refining and Reusing Annotation Guidelines for LLM Annotation

LLMs struggle to follow specialized conventions of gold-standard benchmarks. Authors propose an iterative moderation framework that reuses and refines annotation guidelines as an alignment mechanism. Testing on three biomedical NER tasks (NCBI Disease, BC5CDR, BioRED) with GPT, Gemini, DeepSeek confirms efficacy of guideline integration and reasoning-optimized models.

GPT Gemini DeepSeek

SIG

HYP

arXiv cs.LG·May 21

Instance Discrimination for Link Prediction

Two new self-supervised learning models for link prediction in graphs: L-GRACE and L-BGRL. Based on link representations instead of node representations, they incorporate structural augmentation grounded in community structure. Performance matches state-of-the-art in both supervised and self-supervised settings.

Papers Benchmarks RAG

SIG

HYP

arXiv cs.CL·May 21

Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation

DPR-BAG generates abstracts for biomedical articles lacking summaries through structured decomposition (BOMRC schema), parallel LLM-based summarization, and refinement. On PMC-MAD (46,309 articles), improves abstractive novelty while maintaining factual consistency. Training-free, zero-shot framework.

Prompt engineering RAG Benchmarks

SIG

HYP

arXiv cs.LG·May 21

WaveGraphNet: Physics-Consistent Guided-Wave Damage Localization through Coupled Inverse-Forward Graph Learning

WaveGraphNet is a coupled inverse-forward graph learning framework for guided-wave damage localization in CFRP plates. The model represents piezoelectric transducers as graph nodes and uses a forward branch as a physics-consistent regularizer to improve generalization to unseen regions of the structure.

Papers Benchmarks

SIG

HYP

arXiv cs.LG·May 21

GraphDiffMed: Knowledge-Constrained Differential Attention with Pharmacological Graph Priors for Medication Recommendation

GraphDiffMed presents a medication recommendation framework using dual-scale Differential Attention v2 with pharmacological constraints. Tested on MIMIC-III, the model filters noise at intra-visit and inter-visit levels while incorporating drug-drug interactions, outperforming baselines on recommendation quality and safety metrics.

Benchmarks Papers AI safety

SIG

HYP

Reddit r/LocalLLaMA·May 20

Try ik_llama.cpp with MTP if you have limited VRAM. You will be pleasantly surprised!

ik_llama.cpp outperforms llama.cpp on MTP with RTX 4070 Super 12GB. Using Qwen3.6-35B-A3B-IQ4_XS, user achieves 110.24 tok/s average and 87.49% acceptance rate. Optimized configuration provided with specific cache and quantization parameters.

Llama Qwen Multi-agent

SIG

HYP

Latent Space·May 20

Railway: The Agent-Native Cloud — Jake Cooper

Railway, agent-native cloud platform, reaches 3M users with 100K signups/week. Own-metal infrastructure, $200K+ coding agent spend, elimination of traditional pull requests.

AI Agents Code generation Infrastructure

SIG

HYP

Reddit r/MachineLearning·May 20

OpenAI claims a general-purpose reasoning model found a counterexample to Erdos's unit-distance bound [D]

OpenAI claims a general-purpose reasoning model found a counterexample to Erdős's unit-distance conjecture in discrete geometry. The model constructed planar point sets with more than n^{1+δ} unit distances, disproving the conjectured upper bound. The proof was verified by an AI grading pipeline and reviewed by mathematicians.

Reasoning OpenAI Papers

SIG

HYP

Reddit r/MachineLearning·May 20

under 2% quality gap but 10x cost difference: tested 5 models on identical tool calling tasks[D]

Benchmark of 5 models (Opus 4.7, GPT-5, Sonnet 4.6, DeepSeek V4 Pro, Hunyuan Hy3) on 8 Python refactoring tasks with MCP. Quality gap <2% (96-99% first-attempt tool call success) but 10x cost difference: Opus $15, GPT-5 $11, Sonnet $4, DeepSeek <$2, Hunyuan $1.50.

MCP AI Agents Code generation

SIG

HYP

The Decoder·May 20

Google tests the app market version of the SaaSpocalypse

Google AI Studio now generates native Android apps from prompts in Kotlin with Jetpack Compose, testable in a browser emulator. For simple utility apps like trackers and checklists, the Play Store could become less relevant. Apple blocks AI-generated apps.

DeepMind Code generation Tools

SIG

HYP

Reddit r/MachineLearning·May 20

NOML-NOML: hierarchical TD3 + anchor policy for flight control [P]

Custom RL algorithm NOML for continuous 6-DoF flight control. Combines TD3 with anchor policy (fixed safe action), hierarchical actor (3 independent MLPs pitch→roll→rest), and mirror learning (left-right symmetry). Solves vanilla TD3 oscillation collapse. Open-sourced under Apache 2.0.

Reinforcement learning Code generation Robotics

SIG

HYP

The Decoder·May 20

Google pairs its Genie world model with Street View to create explorable AI worlds based on real places

Google DeepMind integrates its Genie 3 world model with Street View data: users drop a pin on a map and explore an AI-generated world based on a real place. Street View becomes a strategic training resource for AI agents and robots.

DeepMind AI Agents Robotics

SIG

HYP

GitHub Trending·May 20

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> lance-format /</span> lance

Lance is an open lakehouse format for multimodal AI. Converts from Parquet in 2 lines of code with 100x faster random access, vector indexing, and data versioning. Compatible with Pandas, DuckDB, Polars, PyArrow, PyTorch.

Vector search Embeddings Open source

SIG

HYP

The Decoder·May 20

Google's Gemini 3.5 Flash follows Anthropic and OpenAI in making newer AI models significantly pricier

Gemini 3.5 Flash costs 5.5x more than its predecessor in benchmark testing. On agent tasks, total costs exceed Gemini 3.1 Pro by 75% due to more interaction steps required. Google follows industry trend: AI models are becoming significantly pricier across Anthropic and OpenAI as well.

Gemini AI Agents Benchmarks

SIG

HYP

Vercel AI Blog·May 20

Grok Build 0.1 now available on Vercel AI Gateway

Grok Build 0.1, a beta coding model trained for agentic coding, is now available on Vercel AI Gateway. The model runs with non-configurable reasoning effort and no non-reasoning mode. Vercel AI Gateway provides a unified API for model calls, usage and cost tracking, with intelligent provider routing and automatic retries.

Code generation AI Agents Reasoning

SIG

HYP

Reddit r/LocalLLaMA·May 20

Guardrails take an 8B model from 53% to 99% on agentic tasks [ACM CAIS '26 preprint]

Guardrails improve an 8B model from 53% to 99% on agentic tasks, according to an ACM CAIS '26 preprint. The technique enhances control and reliability of AI agents.

AI Agents AI safety Benchmarks

SIG

HYP

arXiv cs.AI·May 20

Agentic Trading: When LLM Agents Meet Financial Markets

Systematic review of 77 studies on LLM agents in financial trading. Only 19 studies meet minimum criteria (action output + closed-loop evaluation). Key finding: lack of comparable protocols, insufficient reproducibility (no R3 studies), and missing documentation on transaction costs and universe handling.

AI Agents Papers Evals

SIG

HYP

arXiv cs.LG·May 20

Multi-Pedestrian Safety Warning at Urban Intersections Use Case of Digital Twin

Multi-pedestrian safety warning system at urban intersections using a tightly coupled digital twin framework with camera and ultra-wideband sensors, trajectory prediction modeling. Deployed on COSMOS testbed in New York City, delivers real-time alerts via edge-cloud computing and significantly reduces response times for vulnerable road users.

Vision Infrastructure AI safety

SIG

HYP

arXiv cs.AI·May 20

Efficient Elicitation of Collective Disagreements

Theoretical paper on efficient elicitation of collective disagreement among voters. Introduces the plurality matrix, a generalization of pairwise comparisons, to identify minimal aggregated preference information needed for disagreement measures. Shows that certain measures (rank-variance, divisiveness) require subset size 3, not just pairwise comparisons.

Papers Evals

SIG

HYP