arXiv cs.LG·28 May 2026

A Simple State Space Model Excels at Multivariate Time Series Classification

Signal

Hype

In three linesSystematic study comparing state space models (SSM) for time series classification. S4D outperforms Mamba variants in accuracy and efficiency. Authors introduce MS4 and MS4N, lightweight S4D variants with linear input projection and channel-mixing. Evaluation on 59 datasets (MONSTER, UEA): MS4N matches models 10× larger in parameters.

## S4D Beats Mamba on Time Series Classification: What It Actually Means

### 1. The Core Finding and Why It's Surprising

Since 2023, Mamba has been the de facto reference SSM architecture, driven by its input-dependent state transitions (selective state spaces). The community broadly assumed that this added complexity — selectivity, hardware-aware scanning, dynamic parameters — mechanically translated into performance gains. This paper invalidates that assumption for a specific domain: multivariate time series classification (TSC).

S4D, a diagonal SSM with *fixed* (non-input-dependent) parameters, consistently outperforms Mamba variants in both accuracy and efficiency across 59 datasets. This is not a marginal result on a niche benchmark: the evaluation spans MONSTER (up to 60 million samples, sequences of 50,000 timesteps, 82 classes) and the UEA suite, against 15 baselines. It is the largest comparative study of SSMs for TSC to date.

### 2. MS4 and MS4N: What Changes Architecturally

The authors go beyond comparison and propose two lightweight modifications to S4D:

- **MS4**: adds a linear input projection and a channel-mixing mechanism. Channel-mixing is critical for multivariate series where inter-channel correlations carry discriminative information (e.g., IMU sensors, multi-electrode EEG). - **MS4N**: a normalized variant of MS4 that stabilizes state dynamics with negligible overhead. The normalization targets the classic SSM problem on long sequences: hidden state drift that degrades accuracy over 50K-timestep horizons.

The key MS4N result: it matches or surpasses competing deep learning models that are **2× and 10× larger in parameter count**. Competitive performance without scaling has direct implications for inference costs and embedded deployment constraints.

### 3. Why S4D Wins Where Mamba Doesn't

The most plausible explanation lies in the nature of TSC vs. generative sequence modeling. Mamba was optimized for tasks where contextual selectivity is critical: text generation, where the model must dynamically ignore certain tokens and retain others. TSC is a discriminative, global task — the goal is a whole-sequence representation, not token-by-token prediction. S4D's fixed parameters are sufficient to capture the relevant frequency and temporal dynamics, without the computational overhead of selective scanning.

Additionally, MONSTER includes extremely long sequences (50K timesteps). At these horizons, the quadratic complexity avoided by SSMs is a structural advantage, but Mamba's selectivity introduces parametric variance that can hurt generalization when training data is limited relative to dimensionality.

### 4. Losers and Practical Implications

**Direct losers:** Mamba-specialized TSC architectures — several 2023-2024 papers proposed Mamba adaptations for TSC — now have a weakened justification. If vanilla S4D outperforms Mamba, the additional engineering of those variants is hard to defend.

**Indirect losers:** Transformer-based approaches for long-sequence TSC (Informer, PatchTST, etc.) remain under pressure. MS4N matching Transformers with 10× fewer parameters is a strong signal that scaling is not the right direction for this domain.

**Winners:** Practitioners deploying TSC on constrained hardware (edge, industrial IoT, medical wearables) now have a solid, lightweight, and thoroughly evaluated baseline. MS4N provides a credible starting point without the training resource requirements of large models.

**What remains open:** The paper does not address TSC in few-shot regimes or cross-domain transfer learning. S4D/MS4N are trained from scratch on each benchmark — the question of pre-trained generalizability is untouched. Furthermore, MONSTER and UEA primarily cover sensor/medical/HAR domains; performance on high-frequency financial series or NLP-adjacent tasks remains to be established.

On reproducibility: the abstract does not mention public code release — a key point to monitor for practical adoption.

Read source

Your take?

Benchmarks Papers Reasoning

Summary generated by Claude — human-verified

A Simple State Space Model Excels at Multivariate Time Series Classification

Other angles on this story