Page 28 of 192

AllHigh signalRecent

7679 articles

PhysioSeq2Seq: A Hybrid Physiological Digital Twin and Sequence-to-Sequence LSTM for Long-Horizon Glucose Forecasting in Type 1 Diabetes

PhysioSeq2Seq combines patient-specific physiological digital twin modeling with Seq2Seq LSTM for 240-minute glucose forecasting in type 1 diabetes. Trained on 348 participants (T1DEXI), evaluated on 74: MAE 39.28 mg/dL at 240-min horizon, reducing bias by 13.89 mg/dL vs recursive LSTM.

Reasoning Reinforcement learning Benchmarks

SIG

HYP

arXiv cs.CL·May 19

Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in Large Language Models

Beacon is a diagnostic benchmark measuring sycophancy (LLMs' tendency to prioritize user agreement over factual accuracy) across 12 SOTA models. Authors identify stable linguistic and affective sub-biases scaling with model capacity, and propose prompt-level and activation-level interventions to modulate them.

Alignment AI safety Evals

SIG

HYP

arXiv cs.CL·May 19

ShareChat: A Dataset of Chatbot Conversations in the Wild

ShareChat is a corpus of 142,808 conversations (660,293 turns) collected from ChatGPT, Perplexity, Grok, Gemini, and Claude between April 2023 and October 2025. The dataset preserves native affordances (citations, reasoning traces, code artifacts) across 95 languages and enables analysis of cross-platform differences in intent satisfaction, citation strategies, and response latency.

Benchmarks Evals Papers

SIG

HYP

arXiv cs.CL·May 19

Ancient Greek to Modern Greek Machine Translation: A Novel Benchmark and Fine-Tuning Experiments on LLMs and NMT Models

New AG-MG parallel corpus with 132,481 sentence pairs for Ancient-to-Modern Greek translation. Creation pipeline combines web-scraping, VecAlign alignment with fine-tuned LaBSE embeddings, and Gemini 2.5 Flash LLM-based correction. Benchmark of NMT models (NLLB, M2M100) and Greek LLM (Llama-Krikri-8B): full fine-tuning achieves 13.16 BLEU, gains up to +10.3 points.

Benchmarks Fine-tuning Embeddings

SIG

HYP

Page 28 of 192

PhysioSeq2Seq: A Hybrid Physiological Digital Twin and Sequence-to-Sequence LSTM for Long-Horizon Glucose Forecasting in Type 1 Diabetes

Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in Large Language Models

ShareChat: A Dataset of Chatbot Conversations in the Wild

Ancient Greek to Modern Greek Machine Translation: A Novel Benchmark and Fine-Tuning Experiments on LLMs and NMT Models

Surgical Post-Training: Proximal On-Policy Distillation for Reasoning with Knowledge Retention

AgentKernelArena: Generalization-Aware Benchmarking of GPU Kernel Optimization Agents

(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models

Causely: A Causal Intelligence Layer for Enterprise AI A Benchmark Study on SRE and Reliability Workflows

Perovskite-R1: a domain-specialized large language model for intelligent discovery of precursor additives and experimental design

DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models

Fix the Structural Bottleneck: Context Compression via Explicit Information Transmission

The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level

DSPR: Dual-Stream Physics-Residual Networks for Trustworthy Industrial Time Series Forecasting

Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token (TTFT)

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

Context Memorization for Efficient Long Context Generation

\textsc{PrivScope}: Task-scoped Disclosure Control for Hybrid Agentic Systems

From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

Attractor-Vascular Coupling Theory: Formal Grounding and Empirical Validation for AAMI-Standard Cuffless Blood Pressure Estimation from Smartphone Photoplethysmography

How Few-Shot Examples Add Up: A Causal Decomposition of Function Vectors in In-Context Learning

PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning

Closing the Gap at CRAC 2026: Two-Stage Adaptation for LLM-Based Multilingual Coreference Resolution

Wavelet Flow Matching for Multi-Scale Physics Emulation

Automatic Generation of High-Performance RL Environments

MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

Embodied Task Planning via Graph-Informed Action Generation with Large Language Models

Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference

Surgical Post-Training: Proximal On-Policy Distillation for Reasoning with Knowledge Retention

ProxyKV: Cross-Model Proxy Pruning for Efficient Long-Context LLM Inference

Learning Transferable Topology Priors for Multi-Agent LLM Collaboration Across Domains

Strategic Over-Parameterization for Generalizable Low-Rank Adaptation

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Asking Back: Interaction-Layer Antidistillation Watermarks

Locally Coherent Parallel Decoding in Diffusion Language Models

Physics-Guided Geometric Diffusion for Macro Placement Generation

Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex

LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage

Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra