NVIDIA releases Nemotron 3 Ultra on HuggingFace: 550B total parameters, 55B active, hybrid Mamba-Transformer MoE architecture, pretrained on 20T tokens with a 1M context window. The headline number is throughput: ×6 versus comparable public LLMs, driven by the Mamba component cutting the cost of long sequences. Checkpoints, training data, and the full recipe (SFT + RL + multi-teacher distillation) are all open — making this one of the most complete open-source releases of the year at this model size. For teams running large-scale inference, the 10% active-to-total parameter ratio is the real cost lever.
CODA-BENCH (arXiv:2606.15300) lands at the right moment to reset expectations on data agents. 1,009 tasks built on the Kaggle ecosystem, ~980 files per environment, and the best agent tops out at 61.1% success on tasks that combine data discovery with code execution. This mirrors the gap seen on pure-code benchmarks two years ago — before SWE-bench forced a rethink of agent pipelines. CODA-BENCH will likely play the same role for data-science agents. Worth reading alongside PrologMCP (Claude Sonnet 4.6, GPT-4.1, o4-mini on PARARULE-Plus), which hits 0.99–1.00 precision on deductive reasoning by exposing Prolog as a stateful tool via MCP: agents aren't uniformly weak, they're weak on unstructured high-volume data tasks and near-perfect when reasoning is formalized upstream.
On the tooling side: quicktok encodes 4–11× faster than tiktoken with byte-identical output, using a 2-byte trie and dense caches in C++. On high-throughput pipelines doing on-the-fly tokenization — RAG, batch preprocessing — this is the kind of CPU optimization that changes your cost profile without touching the rest of the stack. Supports cl100k, o200k, Llama-3, Qwen2.5/3.
CODA-BENCH is the first benchmark jointly evaluating code and data intelligence in AI agents. Built on the Kaggle ecosystem with 1,009 tasks and ~980 files per environment, it reveals that top agents achieve only 61.1% success rate when integrating data discovery with code execution.
PrologMCP exposes Prolog as a stateful tool via Model Context Protocol for LLM agents. Tested on PARARULE-Plus with Claude Sonnet 4.6, GPT-4.1, and o4-mini, the system achieves 1.00 accuracy on the general set and 0.99–1.00 on the challenging set, outperforming reasoning models on deductive tasks.
NVIDIA introduces Nemotron 3 Ultra, a 550B-parameter (55B active) Mamba-Transformer MoE hybrid model pre-trained on 20T tokens with 1M context length. Uses SFT, RL, and multi-teacher distillation. Achieves ~6x inference throughput of public LLMs with comparable accuracy. Base, post-trained, and quantized checkpoints, training data, and recipe open-sourced on HuggingFace.
quicktok is a BPE tokenizer written in C++ producing byte-identical tokens to tiktoken. Encodes 2–3.6× faster than bpe-openai and 4–11× faster than tiktoken itself. Supports cl100k, o200k, GPT-OSS, Llama-3, Qwen2.5/3. Optimizations: 2-byte trie, dense caches, hand-compiled pretokenizer.
Two-layer transformers classify rational elliptic curves (rank 0 vs 1) with >99% accuracy from 128 Frobenius traces. Mechanistic interpretability analysis reveals a sparse circuit of 20 neurons implements the Mestre-Nagao heuristic (weights log(p)/(p·log B), r=0.997), autonomously discovering an analytic number theory result.