Poolside releases the technical report for Laguna M.1 (225.8B params, 23.4B active) and Laguna XS.2 (33.4B total, 3B active), two MoE models trained end-to-end for agentic coding. The benchmark suite — SWE-bench Verified, SWE-bench Multilingual, SWE-Bench Pro, Terminal-Bench 2.0 — maps directly to what dev-agent teams actually evaluate. XS.2 ships under Apache 2.0, removing legal friction for production deployment. At 3B active parameters, it competes head-on with the lightweight code-specialized models already running inside several IDE vendors.
Two infrastructure papers drop the same day, targeting orthogonal but equally critical bottlenecks. HQMQ (Hurwitz Quaternion Multiplicative Quantization) compresses KV cache without calibration by treating each 4-element chunk as a Hurwitz quaternion: on Llama-3-70B, 43 GB → 8.5 GB at fp16 quality, outperforming naive int4 by 3–1900× depending on the task. Validated on Mistral-7B, Llama-3-8B, Qwen2.5/3-8B, and gpt-oss-20b. Separately, the latent prediction paper (data2vec, JEPA) formally proves that predicting one's own representations reduces sample complexity from exponential in depth L to constant — a theoretical grounding for why JEPA-style architectures converge faster than autoregressive models under data-limited regimes.
The search agent training study (arXiv:2605.27881) surfaces a systematic methodological bias in the literature: a substantial share of reported gains on Wikipedia 2018 is explained by data coverage, not algorithmic differences. Outcome-based rewards consistently beat process-based approaches. This is a direct warning for anyone benchmarking RAG+RL pipelines on public datasets without controlling for that variable.
Laguna M.1 (225.8B parameters, 23.4B activated) and Laguna XS.2 (33.4B total, 3B activated) are two MoE foundation models trained end-to-end for agentic coding. Competitive on SWE-bench Verified, SWE-bench Multilingual, SWE-Bench Pro, and Terminal-Bench 2.0. XS.2 released under Apache 2.0.
Systematic study comparing state space models (SSM) for time series classification. S4D outperforms Mamba variants in accuracy and efficiency. Authors introduce MS4 and MS4N, lightweight S4D variants with linear input projection and channel-mixing. Evaluation on 59 datasets (MONSTER, UEA): MS4N matches models 10× larger in parameters.
HQMQ, a calibration-free KV cache compression method for LLMs, quantizes each 4-element chunk as a Hurwitz quaternion. Tested on Mistral-7B, Llama-3-8B, Qwen2.5/3-8B, and gpt-oss-20b: matches fp16 quality at ~5 bits, achieves up to 5.05× compression (Llama-3-70B: 43 GB → 8.5 GB), outperforms naive int4 by 3–1900×.
Theoretical paper on sample complexity of models predicting their own latent representations (data2vec, JEPA). Proves latent prediction reduces sample complexity from exponential in L (depth) to constant, versus token prediction. Validated on probabilistic grammars and neural networks.
Controlled empirical study on training search agents powered by LLMs. Authors isolate three dimensions: (1) data-coverage issue in Wikipedia 2018 corpus explains larger gains than algorithmic differences, (2) outcome-based rewards outperform process-based approaches, (3) analysis of training data diversity and search budget scaling. Code released.