La startup IA sans employés Polsia boucle une levée de fonds de 30 M$
Polsia, an AI startup with no employees, raises $30M with annual revenue near $10M. The business model based on AI automation attracts investors.
Polsia, an AI startup with no employees, raises $30M with annual revenue near $10M. The business model based on AI automation attracts investors.
Cost analysis: self-hosting dual 3090 (~$0.50-0.80/token with depreciation) vs RunPod H100 (~$1.49-1.99/h, 2-3x faster). For light usage (2-3h/day), cloud is cheaper. Real reasons for self-hosting: privacy, autonomy, learning, no cold-start, sovereignty—all non-economic.
Talkie-1930-13b-it, a 13B model trained on 260B tokens of pre-1931 English text, is added to llama.cpp. Instruction-tuned via DPO with LLM-as-judge on historical etiquette manuals and encyclopedias. Simulates conversations with historical personas.
Theoretical paper on optimizing user-AI recommendation system interaction. Models communication cost (precision of user message) and search cost (size of recommendation set). For large d, characterizes how optimal message precision and recommendation set size depend on cost parameters under two sampling schemes: posterior belief and optimized tilted distribution.
Femtosecond laser-pumped Coherent Ising Machine (CIM) integrated with LLM-driven agentic system using LangGraph and LangChain. Large language models automatically calibrate QUBO/Ising models, iterate constraint weights, and validate schemes. Fully implemented on domestic Chinese models and hardware.
MiniCPM5-1B, a 1-billion-parameter model weighing 0.5 GB, outperforms significantly larger models. Demonstrates that efficiency and performance do not require massive scale.
Vercel rolls out routing update for Microfrontends. Aliases created with `vc alias` now inherit full routing config from source deployment. Branch-assigned domains now route to that branch across all projects in the Microfrontend, not just the owning project.
ThriftAttention introduces selective mixed precision for optimized FP4 attention on long contexts. The method reduces memory consumption and accelerates inference by applying varying precision levels to critical attention regions.
Japan successfully tested a ramjet engine designed for Mach-5 aircraft. The trial validates hypersonic propulsion technology, a key milestone toward next-generation supersonic aircraft.
Financial Times reports Heretic, a GitHub tool, removes guardrails from Llama 3.3 in under 10 minutes. Creator Philipp Emanuel Weidmann confirms 3,500 'decensored' models created and 13 million downloads since launch.
Anthropic releases claude-cookbooks, a collection of notebooks and recipes demonstrating practical and creative ways to use Claude.
llmfit: CLI tool to test hundreds of LLM models and providers on your hardware. One command to identify what runs locally.
Meetily is an open-source, self-hosted meeting assistant built on Rust. 4x faster transcription than Whisper/Parakeet, speaker diarization, Ollama-based summarization. 100% local processing, no cloud required.
A learning repo « MCP from Scratch » teaches Model Context Protocol in plain Node.js, from raw JSON-RPC to a working local agent loop (plan → act → observe) using node-llama-cpp and GGUF models. Designed to expose underlying mechanics without heavy abstractions.
Paper on aircraft disassembly scheduling for end-of-life aircraft. Proposes Constraint Programming and MIP models to handle thousands of tasks with precedence constraints, technician certifications, aircraft balance, and space limitations. Tested on real instances up to 1450 tasks from industrial partner.
Academic paper combining Dynamic Programming (DP) and Constraint Programming (CP) to solve the Partial Shop Scheduling Problem. DP serves as primary search framework while CP leverages global constraint propagation. The approach integrates anytime strategies and Large Neighborhood Search schemes.
llama.cpp features a KV cache optimization that re-sends generated tokens to cache instead of waiting for next prompt, improving responsiveness. User reports latency reduction from 5-30s to near-instant on Qwen 3.6-35B with RX 7900 XTX (~100 tps).
OpenAI partners with Grupo Folha and Grupo UOL to integrate trusted Brazilian journalism into ChatGPT. Content will be attributed with transparency.
User runs Qwen 3.6-35B-A3B-MTP on GTX 1060 6GB via LMStudio. Setup: Q4_K_XL quantization, 131k context, 41 layers GPU-offloaded, prefill 130-150 tps, decode 16 tps. Usable for chat on legacy hardware.
Armin Ronacher (Pi creator) denounces LLM-generated bug reports poorly prompted against his open-source project. These reports contain inaccurate yet confident conclusions, fake minimal reproductions, and wrong root cause guesses. He requests contributors limit issues to observed facts: command run, expected outcome, actual outcome, exact logs.
A llama.cpp user implements a secure web RAG workflow by enabling native server tools (exec_shell_command) with multi-sandboxing: firejail + dedicated Linux user + Alpine OCI container. Allows Qwen 3.6-35B model to execute wget commands directly from web UI to fetch and analyze content.
Anthropic releases an open-source repository of plugins for Claude designed for knowledge workers. Plugins enable integration of Claude into professional workflows.
Aider is an AI pair programming tool running in the terminal. It enables developers to collaborate with AI directly in the command line for code writing and editing.
Anthropic releases open-source repository of plugins for Claude designed for knowledge workers. Plugins enable Claude integration into productivity workflows.
Mathematician Adam Kucharski shows Microsoft Copilot invents country-based stereotypes when analyzing identical datasets with different country labels. Reasoning models catch the trick, but only if users explicitly select them instead of relying on default settings.
Anthropic likely continues supplying Claude to the NSA despite Pentagon flagging it as a supply chain risk. Intelligence agencies lack Nvidia's latest Grace Blackwell chips; Anthropic's "Mythos" model reportedly runs on older hardware. The controversial "any lawful use" clause is not part of the deal.
Developer builds web GUI for TradingAgents, a multi-agent LLM stock analysis framework. Replaces CLI with local interface supporting Ollama, OpenAI, Anthropic, Google, DeepSeek and others. Adds live pipeline visualization, report reader, token reduction (~50% concise mode), multi-session chat. Apache 2.0.
TTS benchmark comparison covering all known models through May 2026. Windows and Mac results available, Linux testing underway. GitHub repo with HTML results page.
User computed embeddings for NVIDIA's Nemotron-Personas dataset (millions of synthetic personas) using Qwen 0.6B. Precomputed vectors enable semantic search and persona clustering. Precomputed embeddings and web demo available on Hugging Face.
NVFP4 and MTP are now available together in llama.cpp (release b9297). This combination of quantization and optimization enables improved performance on NVIDIA GPUs.
Chrome extension to run Gemini Nano (Gemma) locally on PC without GPU. Requires 16 GB RAM, ~20 tokens/s on laptop, 9216 tokens per session. One-click extension available on Chrome Web Store or GitHub repo.
Python package to install prebuilt llama.cpp server binaries. Solves portability: deploy llama.cpp as local subprocess without documenting build steps. Available on PyPI and GitHub with support for standard llama.cpp flags and custom builds.
Benchmark llama.cpp vs LiteRT (Google) on custom 24/7 server using Xiaomi 12 Pro (Snapdragon 8 Gen 1). Llama.cpp: 30.6 t/s prompt, 5.7 t/s generation, moderate CPU load. LiteRT: slightly faster generation but maxes CPU and higher power draw. Setup features copper/aluminum cooling, custom safe PSU, 3D-printed case.
AgentLantern is an open-source devtool for AI agent projects. It provides three features: documentation generation from source code, static linting to detect configuration issues, and a pixel-art runtime viewer. Initial CrewAI support with plans to extend to other frameworks.
Databricks releases ai-dev-kit, a toolkit for building coding agents. Maintained by Field Engineering, the project provides components and patterns to construct AI agents capable of generating and manipulating code.
Pydantic-AI is an open-source framework for building AI agents following Pydantic principles. It provides a structured approach to developing multi-agent systems with built-in data validation.
CrewAI is an open-source framework for orchestrating autonomous AI agents in collaborative roles. It enables agents to work together seamlessly on complex tasks through collective intelligence.
UC Berkeley Law bans AI from nearly all graded work starting summer 2026, including outlining, drafting, and proofreading. Only research use permitted. Rationale: future lawyers must learn independent thinking before meaningfully using AI.
Google CEO Sundar Pichai reframes links as a "part" of search rather than its foundation. Google is pivoting from traffic distributor to AI publisher, keeping users within its ecosystem and exercising editorial power over source selection.
User tests APEX quantization of Gemma 4 26B on AMD RX 9060 XT GPU. Achieves 38 tokens/sec at 90k context with no quality degradation using llama.cpp Vulkan. APEX-I-Compact model (15GB) outperforms previous Q5 quant (21.2GB) which looped at 50k context.