Page 65 of 147

AllHigh signalRecent
5877 articles
arXiv cs.AI·

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

Learn-by-Wire Guard (LBW-Guard) is an autonomous governance layer that supervises the AdamW optimizer during language-model training. Tested on Qwen2.5-7B with WikiText-103, LBW-Guard reduces final perplexity from 13.21 to 10.74 (−18.7%) and accelerates training by 1.10×. Under extreme learning-rate stress (LR=3e-3), AdamW fails (perplexity 1885.24) while LBW-Guard remains stable (11.57).

QwenReinforcement learningBenchmarks
SIG
72
HYP
25
arXiv cs.LG·

VCR: Learning Valid Contextual Representation for Incomplete Wearable Signals

VCR is a self-supervised framework learning robust representations from incomplete wearable sensor signals. It uses an orthogonal tokenizer to disentangle shared semantics from modality-specific residuals, combined with a missing-aware mixture-of-experts backbone. VCR improves performance on health monitoring tasks under single and multiple missing modalities.

PapersEmbeddingsReinforcement learning
SIG
72
HYP
18
arXiv cs.AI·

Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment

arXiv paper introduces Generative-Evaluative Agreement (GEA), a validity criterion measuring whether an LLM's scoring function recovers skill levels its generative function was instructed to produce. On a two-stage adaptive assessment, the model recovers ~70% of intended variance (r=0.698) with systematic positive bias. GEA is strong (r>0.7) for syntactically verifiable skills but near zero for design-level skills.

EvalsReasoningAI safety
SIG
72
HYP
18
arXiv cs.AI·

Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

ReElicit is a Bayesian optimization framework for tuning system prompts using only aggregate feedback. An LLM dynamically elicits a compact, interpretable feature space, then a Gaussian process selects optimized target vectors refined into deployable prompts. Across 10 tasks with 30-evaluation budget, ReElicit outperforms aggregate-only prompt optimization baselines.

Prompt engineeringReasoning
SIG
72
HYP
25
arXiv cs.AI·

Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production

Microservice architecture for Document AI pipelines in production: classification, OCR, and structured field extraction via LLM. Processes thousands of multi-page documents per hour. Key findings: OCR dominates end-to-end latency (not LLM parsing), system saturation determined by shared GPU capacity. Concrete architectural patterns for production deployment.

InfrastructureCode generationRAG
SIG
72
HYP
15
arXiv cs.LG·

Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints

LILAC+ proposes a framework for safe continual reinforcement learning in nonstationary environments. The system combines three adaptive mechanisms: context-based safety constraints, adaptation-speed constraints, and budget-to-state enforcement. Evaluated in simulated driving, it reduces safety violations under distribution shift while maintaining competitive task performance.

Reinforcement learningAI safetyAlignment
SIG
72
HYP
18
arXiv cs.LG·

From Cumulative Constraints to Adaptive Runtime Safety Control for Nonstationary Reinforcement Learning

CPSS (Constraint Projection Safety Shield) converts cumulative safety budgets into adaptive state-level control constraints for nonstationary reinforcement learning. The mechanism dynamically adjusts risk thresholds based on context, guarantees per-state threshold satisfaction, and reduces safety violations in highway merging scenarios.

Reinforcement learningAI safetyReasoning
SIG
72
HYP
18