Page 56 of 146

AllHigh signalRecent
5831 articles
arXiv cs.LG·

Test-Time Collective Action: Proxy-Based Perturbations for Correcting Algorithmic Harms

New framework enabling user collectives to correct algorithmic disparities without platform intervention. Test-Time Collective Action (TTCA) uses universal perturbations derived from a proxy model to improve fairness without training access. Validation on CIFAR-10, CIFAR-100, and FairFace demonstrates closure of subgroup accuracy gaps and improved worst-group accuracy.

AI safetyAlignmentEvals
SIG
72
HYP
18
arXiv cs.LG·

Balancing Fidelity and Diversity in Diffusion Models via Symmetric Attention Decomposition: Hopfield Perspective

Theoretical paper decomposing the pre-softmax attention matrix QK^T into symmetric and skew-symmetric components. The symmetric part governs the energy landscape, the skew-symmetric part drives circulation. Authors propose Hopfield-style stability measures to quantify fidelity-diversity trade-offs in generation and a controllable mechanism to modulate this trade-off.

ReasoningPapersVision
SIG
72
HYP
15
arXiv cs.LG·

Bayesian Deployment Approval for Learned Landing Controllers under Finite Rollout Validation

Bayesian framework for validating deployment of learned autonomous landing controllers. Uses Bayesian inference to quantify uncertainty about true policy capability beyond empirical metrics (reward, success rate). Experiments with PPO and SAC show empirical optimization overconfidence, while Bayesian inference better calibrates deployment readiness assessment.

Reinforcement learningAI safetyRobotics
SIG
72
HYP
15
arXiv cs.LG·

High-Fidelity Industrial Crash Dynamics Prediction via Geometry-Aware Operator Learning with Memory-Efficient Low-Rank Attention

GeoTransolver, a geometry-aware operator learning framework, accurately predicts industrial-scale automotive crash dynamics. On bumper beam and full-vehicle crash datasets, it captures plastic deformations and acceleration profiles. A FLARE-based modification reduces memory overhead by 2x while improving accuracy for high-frequency transients.

PapersBenchmarksReasoning
SIG
72
HYP
25
arXiv cs.AI·

EAPO: Entropy-Driven Adaptive Positive-Negative Sample Weighting for Policy Optimization in Open-Ended QA

EAPO is an adaptive policy optimization method for training reasoning models in open-ended QA. It dynamically adjusts positive/negative sample weights based on current-to-initial entropy ratio to preserve exploration and stability. Tests on two medical QA datasets show improvements in diversity and stability versus fixed-weight baselines.

Reinforcement learningReasoningEvals
SIG
72
HYP
18
arXiv cs.LG·

Supervised Distributional Reduction via Optimal Transport and Dependence Maximization

SDR (Supervised Distributional Reduction) combines optimal transport and dependence maximization to learn target-aware representations. The algorithm extends the Fused Gromov-Wasserstein objective with an explicit dependence term, producing compact embeddings that capture both geometric structure and predictive signal. Application to Gaussian Process modelling with adaptive kernels.

Papers
SIG
72
HYP
15
arXiv cs.AI·

Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models

Hierarchical framework for compact LLMs in resource-constrained agentic systems. Model distillation + oracle-controller loop monitors protocol validity, projects histories into feasible prompt domain, triggers lightweight fine-tuning under drift. Separates schema learning from semantic adaptation. Evaluated on Multi-Fidelity Bayesian Optimization with improved reliability and cost-efficiency.

AI AgentsFine-tuningPrompt engineering
SIG
72
HYP
18
arXiv cs.LG·

$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference

E³-Agent is an executable and evolving agent for edge generative inference resource management. It pairs a fast-path router (millisecond dispatch) with a slow-path LLM meta-controller driven by events, learning online from execution feedback. Evaluated in simulation, it reduces latency by 65-73% versus static baselines across dynamic scenarios (semantic shifts, device churn, hidden drift).

AI AgentsReasoningInfrastructure
SIG
72
HYP
28
Reddit r/MachineLearning·

Cross-species RSA: same learning rules (BP, PC, STDP, FA) tested against both human fMRI and macaque electrophysiology [P]

Cross-species comparison of learning rules (BP, PC, STDP, FA) tested on human fMRI and macaque electrophysiology (V1/V2/V4/IT). STDP and PC dominate V1/V2 (ρ ≈ 0.30/0.28), conserving human pattern. In IT, alignment depends on model capacity (ResNet-50: ρ ≈ 0.25) rather than learning rule. Code and two papers (arxiv 2604.16875, 2605.22401) available.

PapersBenchmarksReasoning
SIG
72
HYP
15
Reddit r/LocalLLaMA·

Turning every "no thats not what i meant" in chat into actual LoRA training data

A developer built TideForge, a desktop app that converts chat corrections into LoRA training data. Each model reply has a "Teach" button; corrections accumulate as JSONL and trigger PEFT fine-tuning on your base model. Initial test: 110 hand-written corrections on Qwen 0.6B, loss dropped 4.25→0.73, adapter maintained identity across ~30 jailbreak prompts. Free, Windows, GGUF-compatible.

Fine-tuningOpen sourceTools
SIG
72
HYP
35