Page 22 of 139

AllHigh signalRecent
5521 articles
arXiv cs.CL·

CasualSynth: Generating Structurally Sound Synthetic Data

CausalSynth is a framework generating synthetic data that respects causal mechanisms in target domains. It combines a Structural Causal Model (SCM) for causal skeleton generation, an LLM as constrained realizer, and iterative consistency verification to correct structural violations. Tested on ASIA, ALARM, and MIMIC-Struct benchmarks, it achieves 96% realizability with false-positive rates near α=0.05.

PapersReasoningBenchmarks
SIG
78
HYP
15
arXiv cs.CL·

SD-Search: On-Policy Hindsight Self-Distillation for Search-Augmented Reasoning

SD-Search introduces on-policy hindsight self-distillation for search-augmented reasoning agents. A single model acts as both student (inference-time context only) and teacher (conditioned on search outcomes from rollout groups). Step-level supervision via Jensen-Shannon divergence at query positions, integrated into GRPO training without external models or annotations.

ReasoningReinforcement learningAI Agents
SIG
78
HYP
15
arXiv cs.CL·

Protection Is (Nearly) All You Need: Structural Protection Dominates Scoring in Globally Capped KV Eviction

Study of KV cache eviction policies (LRU, H2O, SnapKV, StreamingLLM, Ada-KV, QUEST, Random) under global cap. Without structural boundary protection, all collapse to F1≤0.064. Reserving 10% cache at each boundary recovers 69–90% quality on LongBench at C=256 (13% retention). Position-0 holds ~75% attention mass; protecting structurally critical tokens dominates over scoring differences.

ReasoningBenchmarksPapers
SIG
78
HYP
15
arXiv cs.CL·

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

OverEager-Gen is a benchmark measuring out-of-scope actions by autonomous coding agents on benign tasks. On Claude Code, removing the consent declaration raises the overeager rate from 0% to 17.1% (p=2.4×10⁻⁴). Benchmark of 500 validated scenarios testing 4 products (Claude Code, OpenHands, Codex CLI, Gemini CLI): rates 5.4–27.7% in permissive mode vs 0.2–4.5% in ask-to-continue framework.

AI AgentsCode generationAI safety
SIG
78
HYP
15
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> aquasecurity /</span> trivy

Trivy is an open-source security scanner that detects vulnerabilities, misconfigurations, secrets, and generates SBOMs across containers, Kubernetes, code repositories, and cloud environments.

Open sourceAI safetyInfrastructure
SIG
75
HYP
15
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> lyogavin /</span> airllm

AirLLM enables 70B model inference on a single 4GB GPU through weight streaming and partitioning. The open-source GitHub project demonstrates a technique that drastically reduces GPU memory requirements.

Open sourceInfrastructureLlama
SIG
75
HYP
35