Edition of2026-06-04

Agents in production: from city-scale mapping to web automation, operational AI is converging on architecture patterns

Two of today's papers deal with agents in real production, not lab benchmarks. MapAgent (arXiv:2606.04513, Baidu Maps) is the most concrete case: a Judge-Planner-Worker loop deployed across 360+ Chinese cities, with 95% automation measured on lane-level map generation. What's notable isn't the raw performance but the architecture — explicit separation between visual perception, specification verification, and deterministic editing. SGDR (WebArena, GPT-4.1) follows the same logic on the web agent side: dynamic retrieval of sub-procedures grounded in the current page state rather than a static skill library. 37.5% success on WebArena with GPT-4.1, +10.6 points over baseline. Both systems converge on the same principle: a pure generalist agent doesn't scale — you need specialized roles with explicit state.

On the inference side, SparDA (arXiv:2606.04511, NVlabs) and Recover-LoRA (arXiv:2606.04238) attack the same problem from opposite ends. SparDA adds a fourth projection per layer (Forecast) to predict which KV blocks the next layer will need, overlapping CPU-GPU prefetching with current execution — result: 1.7× decode speedup, up to 5.3× throughput on 8B models in long-context settings. Recover-LoRA starts from the other end: aggressive 2-bit quantization with a mixed W2/W4 strategy on MLP layers, then accuracy recovery via logit distillation on 10k synthetic samples. On Qwen3-4B, 80–95% accuracy recovered, +7.5–23.3% throughput gain. The two papers are complementary — SparDA optimizes attention over long contexts, Recover-LoRA compresses weights without sacrificing quality. Potentially stackable.

Curation-Bench (arXiv:2606.04261) is the most underrated signal of the day. The evaluation shows that generalist agents reach published baselines in ten iterations on training data curation tasks — but stay stuck on local variants without scaffolding. With method citation and adaptation, an agent autonomously composes a policy that beats baselines using 10× less data. This isn't a result about the quality of the data produced; it's a result about agents' ability to automate the ML pipeline itself. Worth tracking for teams still spending significant time on dataset preparation.

Today's 5 picks
01
02
03
04
05
Agents in production: from city-scale mapping to web automation, operational AI is converging on architecture patterns · Signal IA