RSS

arXiv cs.LG

https://arxiv.org/list/cs.LG/recent

arXiv cs.LG·

Quantized Reasoning Models Think They Need to Think Longer, but They Do Not

Post-training quantization (PTQ) reduces reasoning model accuracy and increases chain-of-thought length. 52% of failures involve correct intermediate answers not output as final answers. A training-free logit penalty on overthinking markers ("wait", "but", "alternatively") reduces CoT length by 12-23% while preserving accuracy across 5 models (1.5B-32B) and 5 benchmarks.

ReasoningFine-tuningBenchmarks
SIG
78
HYP
15
arXiv cs.LG·

RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting

RAFT is a two-stage domain fine-tuning method that mitigates catastrophic forgetting. It refines data via self-conditioned rewriting and answer fusion, then applies on-policy distillation where the original model provides soft targets on student-generated trajectories. Across five domains, RAFT improves domain accuracy by 23.2% over standard SFT and recovers 18.2% of degradation on MS-Bench.

Fine-tuningReinforcement learningPapers
SIG
78
HYP
15
arXiv cs.LG·

Beyond Augmentation: Score-Guided Pathological Prior for EEG-based Depression Detection

Novel approach for Major Depressive Disorder detection from EEG without data augmentation. SGC (Score-Guided Classification) uses an unsupervised generative network to model pathological anomalies as prior, fused with deep feature representations. Cross-Channel Spatial Adaptation module handles multi-center channel heterogeneity. Validated on Mumtaz2016 and MODMA datasets.

PapersEvalsVision
SIG
72
HYP
28
arXiv cs.LG·

AI-Guided Design and Optimization of Graphite-Based Anodes via Iterative Experimental Feedback

Iterative AI workflow optimizes graphite-based anodes through sequential learning and experimental feedback loops. Citrine Platform generates surrogate models and refines manufacturing constraints. Results: fabrication reliability improved from frequent failures to 100% success, cells ≥350 mAh/g increased from 28.4% to 84.8%, capacity retention rose from 42.1% to 97.3%.

Reinforcement learningBenchmarksTools
SIG
75
HYP
15
arXiv cs.LG·

ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate

ARCA introduces a token-level credit assignment method for LLM reinforcement learning that addresses degeneracy of intrinsic signals (surprisal, entropy reduction, policy divergence) under LoRA. It measures adapter salience directly via L2 norm of hidden-state residuals instead of output-distribution shifts. Tested on MATH/Qwen3-1.7B with GRPO, ARCA avoids pathological weight concentration.

Reinforcement learningFine-tuningReasoning
SIG
75
HYP
15
arXiv cs.LG·

Adversarially Robust Control of Conditional Value-at-Risk via Rockafellar-Uryasev Conformal Inference

Online, distribution-free framework for controlling Conditional Value-at-Risk (CVaR) in non-stationary and adversarial environments. Combines conformal tail risk control, online learning, and Rockafellar-Uryasev variational representation. Provable safety guarantees for nonlinear tail risk under arbitrary data-generating processes. Applications: portfolio risk management and LLM toxicity mitigation.

PapersAI safetyReasoning
SIG
72
HYP
15
arXiv cs.LG·

Gait2Hip-60: A Unified Deep Learning Benchmark for Predicting Hip Muscle Forces and Joint Moments from Multi-Cadence Gait Kinematics

Unified Gait2Hip-60 benchmark comparing LSTM, Transformer, and Mamba to predict hip muscle forces and joint moments from gait kinematics. Transformer outperforms other models (R²=0.819 for forces, R²=0.862 for moments). External validation on 9 femoral head osteonecrosis patients shows moderate generalization (R²=0.537–0.569).

BenchmarksReasoning
SIG
72
HYP
18
arXiv cs.LG·

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

A new counterfactual evaluation metric (CSS) reveals that six frontier models ranked similarly on traditional coverage-based metrics rank in nearly opposite order when assessed on their ability to update clinical recommendations in response to oncology case mutations. All models fail on surgery-status interventions, a safety blind spot invisible to coverage metrics.

BenchmarksEvalsAI Agents
SIG
82
HYP
18