The strongest signal today comes from UMD/Google/Meta: by letting Claude Code run freely via AutoTTS, researchers obtained a reasoning control algorithm that no one would likely have designed by hand. The result is a 70% reduction in compute versus standard self-consistency, with accuracy preserved, at a total experiment cost of $40 in 160 minutes. This isn't another benchmark — it's a demonstration that coding agents can now produce non-trivial research contributions on prototyping budgets. The immediate follow-up question: how many "suboptimal" algorithms in the current literature would survive a systematic AutoTTS audit?
Meanwhile, an independent benchmark across 30 long PDFs (171 questions, MMLongBench-Doc) resets assumptions about vision vs. OCR for document RAG. Claude Sonnet 4.5 in native vision mode caps at 52% accuracy for $0.2552/query — more expensive and less accurate than LlamaCloud premium + OCR, which hits 59.6% at $0.1885/query. The intrinsic failure rate of vision (7% vs. 0% for OCR after retry) is the number to keep: on production pipelines with SLAs, that delta isn't absorbable. Vision LLMs remain brittle on charts and tables — precisely the elements that concentrate value in financial, regulatory, or technical documents.
On the local tooling side, llampart 1.0.0 ships under MIT as a standalone frontend for llama-server (llama.cpp), with MCP integration, 6-language support, and documented Caddy deployment. In the same vein, a web GUI for TradingAgents (Apache 2.0) adds Ollama to the multi-agent stock analysis stack with ~50% token reduction in concise mode. Both releases confirm a structural trend: the local ecosystem is professionalizing around llama.cpp as the reference runtime, with increasingly complete UI layers that narrow the experience gap with cloud APIs.
Researchers from UMD, Google, and Meta use AutoTTS to let Claude Code independently discover control algorithms for AI reasoning. The discovered algorithm reduces compute by 70% versus standard self-consistency while matching accuracy. The search cost $40 and took 160 minutes.
Benchmark on 30 long PDFs (171 questions) comparing vision LLMs vs OCR for document QA. Claude Sonnet 4.5 native PDF: 52% accuracy, $0.2552/query (5th/6). LlamaCloud premium + OCR: 59.6%, $0.1885/query. Vision underperforms on charts/tables; premium OCR more robust. Vision LLM has 7% intrinsic failure rate vs 0% for OCR after retry.
llampart 1.0.0, standalone local web UI for llama-server, released as MIT open-source. Features extended settings, 6-language localization, two-column conversation sidebar, MCP integration, interface modes (dark/light/Frosted Glass), local import/export, and Caddy deployment guide.
Benchmark on 30 long PDFs (171 questions) comparing native vision-LLMs vs OCR pipelines for document QA. Claude Sonnet 4.5 used. LlamaCloud premium achieves 59.6% accuracy ($0.1885/query), native vision 52% ($0.2552/query, most expensive). Vision underperforms on charts/tables; premium OCR more robust. Vision-LLM has 7% intrinsic failure rate vs 0% for OCR after retries.
Developer builds web GUI for TradingAgents, a multi-agent LLM stock analysis framework. Replaces CLI with local interface supporting Ollama, OpenAI, Anthropic, Google, DeepSeek and others. Adds live pipeline visualization, report reader, token reduction (~50% concise mode), multi-session chat. Apache 2.0.