Page 66 of 148

AllHigh signalRecent
5883 articles
arXiv cs.CL·

How Do Document Parsers Break? Auditing Structural Vulnerability in Document Intelligence

Robustness study of Document Layout Analysis (DLA) pipelines used in RAG and long-document QA. Authors identify footprint bias and propose a lightweight auditing framework measuring block-level structural loss (B-SLR). On 1,000 pages with MinerU and PP-StructureV3, B-SLR correlates better with OCR instability (R²=0.727/0.916) than area-based metrics (R²=0.384/0.110).

PapersEvalsRAG
SIG
72
HYP
18
arXiv cs.CL·

IMLJD: A Computational Dataset for Indian Matrimonial Litigation Analysis

IMLJD is a dataset of 3,613 Indian court judgments on matrimonial disputes (IPC Section 498A, Protection of Women from Domestic Violence Act, CrPC Section 482). Data from Supreme Court of India (2000-2024, 1,474 cases) and Karnataka High Court (2018-2024, 2,139 cases). Quashing petition success rates: 57.6% at Supreme Court vs 39.7% at Karnataka High Court. Dataset, code, and knowledge graph released open-source.

BenchmarksPapersOpen source
SIG
72
HYP
15
Reddit r/MachineLearning·

I built a tool that shows you what GPT-2 is "thinking" in real-time as it generates 3D graph of concept activations per token [R]

AXON visualizes real-time concept activations in GPT-2 through a 3D force-directed graph. A Sparse Autoencoder decomposes the residual stream into interpretable features (geography, cities, languages) per generated token. Stack: TransformerLens + SAELens (backend), FastAPI WebSocket, Three.js (frontend). ~35ms/token on GPU.

GPTOpen sourceTools
SIG
72
HYP
35
Reddit r/LocalLLaMA·

PrivateScribe.ai - Fully local, MIT licensed, free AI transcription built with HIPAA/legal safeguards in mind - One Year Update!

PrivateScribe.ai, fully local open-source transcription platform (MIT license), announces v1 with signed macOS app. Stack: FasterWhisper, pyannote, Ollama, Vite/Flask/SQLite. 256-bit encryption, zero network calls, audit trail, speaker diarization. Built for clinics, law firms, therapists with HIPAA compliance.

Open sourceVoiceCode generation
SIG
72
HYP
28
Reddit r/LocalLLaMA·

A tool I built to generate 3D objects with functional, articulated parts. It's on github, and is mostly LLM-agnostic.

Open-source tool to generate 3D objects with articulated, functional parts. Instead of diffusion (point-cloud blobs), the pipeline uses an LLM as a structured code compiler, generating native Blender Python code targeting specific scene graph nodes. Flutter/Three.js frontend, model-agnostic. Gemini recommended; local models still hallucinate on complex matrix transforms.

Code generationOpen sourceTools
SIG
72
HYP
35
Reddit r/MachineLearning·

Graph spectral analysis (Fiedler value + Scheffer CSD indicators) predicts grokking 21k steps before loss function - five reproducible experiments [R]

Graph spectral analysis (Fiedler value + Scheffer critical slowing down) predicts grokking 21k steps before loss convergence. Five reproducible CPU experiments: early detection, distinct structural fingerprints for grokking vs catastrophic forgetting, guided intervention preserves 91.7% vs 2.6%, 48x acceleration across sequential tasks. Limited to 2-layer MLPs and 1-layer transformers.

PapersEvalsReasoning
SIG
72
HYP
28
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> rtk-ai /</span> rtk

rtk is a Rust CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single binary, zero dependencies.

ToolsInfrastructureCode generation
SIG
72
HYP
25
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Michael-A-Kuykendall /</span> shimmy

Shimmy: Rust inference server compatible with OpenAI API, Python-free. Supports GGUF and SafeTensors, hot model swap, auto-discovery, single binary. Free forever.

Open sourceInfrastructureCode generation
SIG
72
HYP
35
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> tirth8205 /</span> code-review-graph

Code-Review-Graph builds a local knowledge graph for Claude Code. It creates a persistent map of your codebase, reducing tokens by 6.8× on reviews and up to 49× on daily coding tasks.

Claude CodeRAGCode generation
SIG
72
HYP
45