Page 23 of 139

AllHigh signalRecent
5524 articles
arXiv cs.AI·

DELTAMEM: Incremental Experience Memory for LLM Agents via Residual Trees

DeltaMem organizes LLM agent experience memory into two residual trees: one stores goal-conditioned tasks as reusable skills, another stores scene-level environment knowledge. Each tree uses root nodes for generalized base experiences and delta nodes for variations, eliminating redundancy. An autonomous consolidation mechanism distills high-frequency paths into new root nodes.

AI AgentsReasoningPapers
SIG
75
HYP
25
arXiv cs.LG·

Human-in-the-Loop Contextual Bandits for Short-Term Rental Dynamic Pricing: Structural Equivalence of Historical Warm-Up and Approval-Gated Live Learning

HITL-GB framework for short-term rental dynamic pricing: a contextual bandit algorithm generates price recommendations that a human can accept, modify, or reject. Authors show historical data is structurally equivalent to on-policy warm-up, reducing cold-start from ~150 to ~30 episodes. Validated on 1,461 real nights (April 2022–2026).

AI AgentsReinforcement learningBenchmarks
SIG
75
HYP
15
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> chopratejas /</span> headroom

Headroom compresses tool outputs, logs, files, and RAG chunks before sending to LLM. Reduces token consumption by 60-95% without quality loss. Available as library, proxy, and MCP server.

RAGMCPTools
SIG
75
HYP
25
arXiv cs.LG·

ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate

ARCA introduces a token-level credit assignment method for LLM reinforcement learning that addresses degeneracy of intrinsic signals (surprisal, entropy reduction, policy divergence) under LoRA. It measures adapter salience directly via L2 norm of hidden-state residuals instead of output-distribution shifts. Tested on MATH/Qwen3-1.7B with GRPO, ARCA avoids pathological weight concentration.

Reinforcement learningFine-tuningReasoning
SIG
75
HYP
15
arXiv cs.LG·

AI-Guided Design and Optimization of Graphite-Based Anodes via Iterative Experimental Feedback

Iterative AI workflow optimizes graphite-based anodes through sequential learning and experimental feedback loops. Citrine Platform generates surrogate models and refines manufacturing constraints. Results: fabrication reliability improved from frequent failures to 100% success, cells ≥350 mAh/g increased from 28.4% to 84.8%, capacity retention rose from 42.1% to 97.3%.

Reinforcement learningBenchmarksTools
SIG
75
HYP
15