RSS

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/

Reddit r/LocalLLaMA·

Building a free, offline LLM “tutor” grounded in one university textbook — RAG, LoRA, or both? Sanity check wanted

Developer seeks to build a free offline AI tutor grounded in a university textbook. Proposed architecture: RAG as core component (chunking, embedding, retrieval with page/section citations) + optional LoRA for pedagogical style. Questions on model selection (Qwen, Gemma), handling complex structures (figures, equations), and packaging for non-technical users.

RAGFine-tuningOpen source
SIG
35
HYP
15
Reddit r/LocalLLaMA·

I spent months inside verl (an RL post-training framework), forked it, then stopped. Wrote up the internals, the tooling a fork costs, and a nasty NCCL bug.

A researcher who spent months inside verl (ByteDance's RL post-training framework) documents its internals: RLHF loop orchestration, single-controller pattern, data structures (DataProto), and a NCCL bug discovered. Abandoned fork but knowledge shared with the community.

Reinforcement learningAI AgentsOpen source
SIG
65
HYP
15
Reddit r/LocalLLaMA·

A lightweight, real-time multilingual ASR router that runs on local hardware

Lightweight multilingual ASR routing system for local hardware using Zipformer, Silero VAD, and SpeechBrain. Routes audio between specialized monolingual models (~100M parameters) instead of one large model. Achieves 13% WER on inter-utterance code-switching, outperforming cloud APIs. Known limitation: 41% WER on intra-utterance switching. Open-source repo available.

VoiceOpen sourceTools
SIG
78
HYP
25
Reddit r/LocalLLaMA·

I was a Data Scientist for 10 years before becoming a quadriplegic. For the past 3 months, I built VibeETL from scratch: A lightning-fast, visual Alteryx alternative powered by Polars & React Flow.

VibeETL: open-source visual ETL platform built in 3 months by former data scientist. Polars + Rust backend, React Flow frontend with native BFS layout algorithm. Zero external dependencies, sandboxed Python execution (30s timeout). Lightweight Alteryx alternative.

Open sourceToolsInfrastructure
SIG
72
HYP
45
Reddit r/LocalLLaMA·

I bolted an 8-arm reasoning MoE onto a frozen 1.4B Mamba backbone on a single RTX 3060. Here’s the mechanistic autopsy of what broke and what worked.

A researcher built Mamba-Titan-1.4B-Reasoning (2.54B params MoE) on RTX 3060 by freezing a 1.4B Mamba-1 backbone and adding 8 trainable experts. Trained on DeepSeek CoT traces, the model developed a 'vault door' mechanism: the </think> token isolates at the smallest norm (1.991 vs 4.742 mean) to control latent reasoning termination.

ReasoningFine-tuningOpen source
SIG
78
HYP
35
Reddit r/LocalLLaMA — AI feed · Signal IA