Page 24 of 139

AllHigh signalRecent
5525 articles
arXiv cs.AI·

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

Study on harness self-evolution (prompts, skills, memories, tools) in LLM agents. Analyzes two capabilities: harness-updating (producing useful updates) and harness-benefit (benefiting from them). Findings: harness-updating is capability-agnostic (Qwen3.5-9B matches Claude Opus gains), while harness-benefit is non-monotonic (mid-tier models benefit most).

AI AgentsPrompt engineeringBenchmarks
SIG
75
HYP
15
arXiv cs.LG·

Supervised Training Rapidly Degrades Early Visual Cortex Alignment Across Biologically Plausible Learning Rules

Untrained neural networks match early visual cortex better than trained networks. Study on 720 THINGS images and fMRI from 3 subjects shows one training epoch reduces V1 alignment by 25-90% depending on learning rule. Backpropagation degrades most (Δr = -0.080), while predictive coding and STDP preserve alignment better (Δr ~ -0.04).

PapersReasoningAlignment
SIG
75
HYP
15
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Comfy-Org /</span> ComfyUI

ComfyUI is a modular GUI for diffusion models with a node/graph-based interface, providing API and backend capabilities for image generation.

Image generationOpen sourceTools
SIG
75
HYP
25
Reddit r/LocalLLaMA·

PolyRange: Contamination-resistant offensive-AI benchmark for web targets (that ain't a benchmark, THAT's a benchmark)

PolyRange is a cybersecurity AI benchmark that dynamically generates fresh web targets for each evaluation, eliminating training corpus contamination. The author addresses consensus from labs (Anthropic, OpenAI, DeepMind): static benchmarks are saturated and real-world defenses are missing. MIT-licensed, independent from the author's commercial project.

BenchmarksAI safetyEvals
SIG
75
HYP
25
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> anthropics /</span> claude-code

Claude Code is an agentic coding tool in the terminal that understands your codebase and executes routine tasks, explains complex code, and handles git workflows through natural language commands.

ClaudeClaude CodeAI Agents
SIG
75
HYP
35