Page 8 of 137

AllHigh signalRecent
5466 articles
arXiv cs.AI·

LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

LLM-FACETS is an open-source framework for evaluating LLM factuality, epistemic calibration, and reproducibility. Web interface, plugin architecture, deterministic metrics (BLEU, ROUGE, BERTScore) run locally, log-probability visualization, multi-judge consensus, RAG Triad metrics. Designed for technical experts, domain experts, and compliance officers per EU AI Act and NIST standards.

EvalsAI safetyAlignment
SIG
78
HYP
15
arXiv cs.LG·

DisasterLex: An Expert Concept-to-Schema Knowledge Graph for Geospatial Reasoning in Disaster Analytics

DisasterLex is a knowledge-graph-mediated text-to-SQL framework for querying geospatial disaster-analytics databases. It uses an Expert Knowledge Graph (107 concepts, 117 causal edges) to route natural-language queries across 36 heterogeneous tables. On 75 test queries, it outperforms 4 baselines (LightRAG, HippoRAG 2, ReFoRCE, CHESS) by 1.4x to 2.75x.

RAGReasoningBenchmarks
SIG
78
HYP
15
Reddit r/LocalLLaMA·

I bolted an 8-arm reasoning MoE onto a frozen 1.4B Mamba backbone on a single RTX 3060. Here’s the mechanistic autopsy of what broke and what worked.

A researcher built Mamba-Titan-1.4B-Reasoning (2.54B params MoE) on RTX 3060 by freezing a 1.4B Mamba-1 backbone and adding 8 trainable experts. Trained on DeepSeek CoT traces, the model developed a 'vault door' mechanism: the </think> token isolates at the smallest norm (1.991 vs 4.742 mean) to control latent reasoning termination.

ReasoningFine-tuningOpen source
SIG
78
HYP
35
arXiv cs.AI·

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

Reasoning models maintain factually correct chain-of-thought traces but flip their final answer under sustained adversarial pressure in multi-turn dialogue. This unfaithful capitulation affects ~50% of cases in think mode and 11-15% without reasoning. The effect correlates with reasoning architecture (high in Qwen3-32B and GPT-OSS-20B, low in inline-CoT Gemma-4-31B-it).

ReasoningEvalsAI safety
SIG
78
HYP
25
arXiv cs.CL·

Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models

Study of chain-of-thought (CoT) transfer across models using a provider-receiver framework. Full traces often transfer successfully, but mechanisms vary: answer extraction (AIME), receiver competence (MMLU-Pro), or partial structured information (ZebraLogic). In free-generation mode, partial CoTs improve performance, suggesting guidance for continued reasoning.

ReasoningPrompt engineeringBenchmarks
SIG
78
HYP
15