May 2026

3149 articles

CheckSupport: A Local LLM-Powered Tool for Automated Manuscript Submission Checklist Selection and Completion

CheckSupport is an open-source system using locally-deployed LLMs to automate reporting checklist recommendation and completion for scientific manuscripts. Evaluated on peer-reviewed manuscripts, it achieves 90% accuracy for checklist recommendations and 88% for item-level completion, processing each manuscript in 12.5 seconds on CPU-only hardware.

Llama Prompt engineering Evals

SIG

HYP

May 2026

CheckSupport: A Local LLM-Powered Tool for Automated Manuscript Submission Checklist Selection and Completion

Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning

ReTAMamba: Reliability-Aware Temporal Aggregation with Mamba for Irregular Clinical Time Series Prediction

GPU-Accelerated Deep Learning for Heatwave Prediction and Urban Heat Risk Assessment

ORACLE: Anticipating Scams from Partial Trajectories in Streaming App Usage

Are Sparse Autoencoder Benchmarks Reliable?

From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

EAGT: Echocardiography Augmentation for Generalisability and Transferability

Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference

MR-SLAM: Immersive Spatial Supervision for Multi-Robot Mapping via Mixed Reality

KVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric Redundancy

Hierarchical Two-Stage Framework for Environment-Aware Long-Horizon Vessel Trajectory Prediction

Diffusion Attention Expert Model for Predicting and Semi-automatic Localizing STAS in Lung Cancer Histopathological Images

PH-Dreamer: A Physics-Driven World Model via Port-Hamiltonian Generative Dynamics

Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering

ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

ProxyKV: Cross-Model Proxy Pruning for Efficient Long-Context LLM Inference

Membership Inference Attacks on Discrete Diffusion Language Models

Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?

PESD-TSF: A Period-Aware and Explicit Structured Decomposition Framework for Long-Term Time Series Forecasting

Peak-Detector: Explainable Peak Detection via Instruction-Tuned Large Language Models in Physiological Sign

Identifiable Token Correspondence for World Models

Breaking the accuracy-resource dilemma: a lightweight adaptive video inference enhancement

Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex

LERA: LLM-Enhanced RAG for Ad Auction in Generative Chatbots

Optimising CSRNet with parameter-free attention mechanisms for crowd counting in public transport

WhiteTesseract: Reframing the Interpretation of Cultural Heritage through XR and Conversational AI

MoleCode unlocks structural intelligence in large language models

Visual Agentic Memory: Enabling Online Long Video Understanding via Online Indexing, Hierarchical Memory, and Agentic Retrieval

Hypergraph Pattern Machine: Compositional Tokenization for Higher-Order Interactions

Inventorship in AI-Assisted Inventions: Designing an Experiment to Shape Case Law

Isotonic Survival Regression: Calibrated Survival Distributions from Deep Cox Models

Federated Nested Learning: Collaborative Training of Self-Referential Memories for Test-Time Adaptation

Geometric Asymmetry in MoE Specialization: Functional Decorrelation and Representational Overlap

HPC-LLM: Practical Domain Adaptation and Retrieval-Augmented Generation for HPC Support

GRASP: Graph Agentic Search over Propositions for Multi-hop Question Answering

PromptDecipher: Supporting AI Tutor Authoring Through Editable Simulated Interactions

LoopQ: Quantization for Recursive Transformers

Graph Hierarchical Recurrence for Long-Range Generalization

Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control

DACA-GRPO: Denoising-Aware Credit Assignment for Reinforcement Learning in Diffusion Language Models

CrossView Suite: Harnessing Cross-view Spatial Intelligence of MLLMs with Dataset, Model and Benchmark

An Assessment of Human vs. Model Uncertainty in Soft-Label Learning and Calibration

PIMSM: Physics-Informed Multi-Scale Mamba for Stable Neural Representations under Distribution Shift

PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications

Distilling Tabular Foundation Models for Structured Health Data

Towards an Inferentialist Account of Information Through Proof-theoretic Semantics

Bi-Level Chaotic Fusion Based Graph Convolutional Network for Stock Market Prediction Interval

Code as Agent Harness

A Machine with Short-Term, Episodic, and Semantic Memory Systems

Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPs

Evaluating AI Alignment in LLMs: Output Analysis of Value Priorities Across 75 Models with Human Benchmarking

Beyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon Planning

GVGAI-LLM: Evaluating Large Language Model Agents with Infinite Games

Learning Reasoning Rewards from Expert Demonstrations with Inverse Reinforcement Learning

Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings

A New Perspective on Precision and Recall for Generative Models

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?

PersonaDual: Balancing Personalization and Objectivity via Adaptive Reasoning

Agentic AI Governance and Lifecycle Management in Healthcare

Real-Time Aligned Reward Model beyond Semantics

Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent

RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs

LiTS: A Modular Framework for LLM Tree Search

Expectation and Acoustic Neural Network Representations Enhance Music Identification from Brain Activity

How Wrong Can Your Counterfactual Be? Quantifying Confounding Bias for Continuous Treatments without a Control Group

Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks

Can LLM Agents Be CFOs? Benchmarking Long-Horizon Resource Allocation in an Uncertain Enterprise Environment

Spatiotemporal Robustness of Temporal Logic Tasks using Multi-Objective Reasoning

Can Heterogeneous Language Models Be Fused?

Language Game: Talking to Non-Human Systems

LEAF: A Living Benchmark for Event-Augmented Forecasting

EmergentBridge: Improving Zero-Shot Cross-Modal Transfer in Unified Multimodal Embedding Models

LEGO: An LLM Skill-Based Front-End Design Generation Platform

DataClawBench: An Agent Benchmark for Exploratory Real-World Financial Data Analysis

Robust Agent Compensation (RAC): Teaching AI Agents to Compensate