Page 25 sur 192

ToutHaut signalRécent

7679 articles

CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions

CAIT est une boîte à outils open-source pour l'analyse syntaxique des interactions enfant-adulte dans CHILDES. Elle inclut un parseur de dépendances entraîné sur UD-English-CHILDES, un tagger POS et un tagger de constructions. Le parseur surpasse SpaCy et Stanza sur ce domaine spécialisé.

Open source Benchmarks

SIG

HYP

arXiv cs.CL·20 mai

SciCustom: A Framework for Custom Evaluation of Scientific Capabilities in Large Language Models

SciCustom est un framework pour construire des benchmarks personnalisés évaluant les capacités scientifiques spécifiques des LLM. Il organise les connaissances scientifiques en unités ontologiques, utilise un consensus multi-modèle pour identifier les unités pertinentes, et génère des benchmarks à partir de données réelles en chimie et santé sans annotation experte.

Benchmarks Évaluations Papers

SIG

HYP

Reddit r/LocalLLaMA·19 mai

Carbon: Decoding the Language of Life

Hugging Face lance Carbon, une famille de modèles fondamentaux ADN open-source. Carbon-3B égale l'état de l'art (Evo2-7B) tout en étant 275× plus rapide. L'approche adapte les techniques LLM modernes : tokenisation 6-mer déterministe, loss factorized (FNS) en mid-training, et curation de données biologiques fonctionnelles.

Open source Benchmarks Fine-tuning

SIG

HYP

Page 25 sur 192

CAIT: A Syntactic Parsing Toolkit for Child-Adult InTeractions

SciCustom: A Framework for Custom Evaluation of Scientific Capabilities in Large Language Models

Carbon: Decoding the Language of Life

Ontology-Constrained Neural Reasoning in Enterprise Agentic Systems: A Neurosymbolic Architecture for Domain-Grounded AI Agents

KVCapsule: Efficient Sequential KV Cache Compression for Vision-Language Models with Asymmetric Redundancy

FinTagging: Benchmarking LLMs for Extracting and Structuring Financial Information

Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management

PH-Dreamer: A Physics-Driven World Model via Port-Hamiltonian Generative Dynamics

Supervising the search process produces reliable and generalizable information-seeking agents

Reasoning Can Be Restored by Correcting a Few Decision Tokens

From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Spatial Blindness in Whole-Slide Multiple Instance Learning

Membership Inference Attacks on Discrete Diffusion Language Models

BacktestBench: Benchmarking Large Language Models for Automated Quantitative Strategy Backtesting

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

Locally Coherent Parallel Decoding in Diffusion Language Models

GRASP: Graph Agentic Search over Propositions for Multi-hop Question Answering

Surgical Post-Training: Proximal On-Policy Distillation for Reasoning with Knowledge Retention

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

PRISMat: Policy-Driven, Permutation-Invariant Autoregressive Material Generation

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

AdaGraph: A Graph-Native Clustering Algorithm That Overcomes the Curse of Dimensionality and Enables Scientific Discovery

From Prompts to Protocols: An AI Agent for Laboratory Automation

AgentWall: A Runtime Safety Layer for Local AI Agents

Beyond Inference-Time Search: Reinforcement Learning Synthesizes Reusable Solvers

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Surgical Post-Training: Proximal On-Policy Distillation for Reasoning with Knowledge Retention

PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

Improved Baselines with Representation Autoencoders

Language-Switching Triggers Take a Latent Detour Through Language Models

The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought

Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex

Trust No Tool: Evaluating and Defending LLM Agents under Untrusted Tool Feedback

Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents

FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing

SomaliWeb v1: A Quality-Filtered Somali Web Corpus with a Matched Tokenizer and a Public Language-Identification Benchmark

Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency