Edition of2026-05-28

Poolside releases Laguna XS.2 under Apache 2.0 while foundational research targets the two core inference bottlenecks: KV cache and sample complexity.

By the editorial team

Poolside releases the technical report for Laguna M.1 (225.8B params, 23.4B active) and Laguna XS.2 (33.4B total, 3B active), two MoE models trained end-to-end for agentic coding. The benchmark suite — SWE-bench Verified, SWE-bench Multilingual, SWE-Bench Pro, Terminal-Bench 2.0 — maps directly to what dev-agent teams actually evaluate. XS.2 ships under Apache 2.0, removing legal friction for production deployment. At 3B active parameters, it competes head-on with the lightweight code-specialized models already running inside several IDE vendors.

Two infrastructure papers drop the same day, targeting orthogonal but equally critical bottlenecks. HQMQ (Hurwitz Quaternion Multiplicative Quantization) compresses KV cache without calibration by treating each 4-element chunk as a Hurwitz quaternion: on Llama-3-70B, 43 GB → 8.5 GB at fp16 quality, outperforming naive int4 by 3–1900× depending on the task. Validated on Mistral-7B, Llama-3-8B, Qwen2.5/3-8B, and gpt-oss-20b. Separately, the latent prediction paper (data2vec, JEPA) formally proves that predicting one's own representations reduces sample complexity from exponential in depth L to constant — a theoretical grounding for why JEPA-style architectures converge faster than autoregressive models under data-limited regimes.

The search agent training study (arXiv:2605.27881) surfaces a systematic methodological bias in the literature: a substantial share of reported gains on Wikipedia 2018 is explained by data coverage, not algorithmic differences. Outcome-based rewards consistently beat process-based approaches. This is a direct warning for anyone benchmarking RAG+RL pipelines on public datasets without controlling for that variable.

Today's 5 picks

arXiv cs.AI·SIG 82

Laguna M.1/XS.2 Technical Report

Laguna M.1 (225.8B parameters, 23.4B activated) and Laguna XS.2 (33.4B total, 3B activated) are two MoE foundation models trained end-to-end for agentic coding. Competitive on SWE-bench Verified, SWE-bench Multilingual, SWE-Bench Pro, and Terminal-Bench 2.0. XS.2 released under Apache 2.0.

AI Agents Code generation Benchmarks

arXiv cs.LG·SIG 82

A Simple State Space Model Excels at Multivariate Time Series Classification

Systematic study comparing state space models (SSM) for time series classification. S4D outperforms Mamba variants in accuracy and efficiency. Authors introduce MS4 and MS4N, lightweight S4D variants with linear input projection and channel-mixing. Evaluation on 59 datasets (MONSTER, UEA): MS4N matches models 10× larger in parameters.

Benchmarks Papers Reasoning

arXiv cs.LG·SIG 82

Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression

HQMQ, a calibration-free KV cache compression method for LLMs, quantizes each 4-element chunk as a Hurwitz quaternion. Tested on Mistral-7B, Llama-3-8B, Qwen2.5/3-8B, and gpt-oss-20b: matches fp16 quality at ~5 bits, achieves up to 5.05× compression (Llama-3-70B: 43 GB → 8.5 GB), outperforms naive int4 by 3–1900×.

Benchmarks Infrastructure Papers

arXiv cs.LG·SIG 82

Learn from your own latents and not from tokens: A sample-complexity theory

Theoretical paper on sample complexity of models predicting their own latent representations (data2vec, JEPA). Proves latent prediction reduces sample complexity from exponential in L (depth) to constant, versus token prediction. Validated on probabilistic grammars and neural networks.

Papers Reasoning Evals

arXiv cs.CL·SIG 78

Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?

Controlled empirical study on training search agents powered by LLMs. Authors isolate three dimensions: (1) data-coverage issue in Wikipedia 2018 corpus explains larger gains than algorithmic differences, (2) outcome-based rewards outperform process-based approaches, (3) analysis of training data diversity and search budget scaling. Code released.

AI Agents RAG Reinforcement learning