arXiv cs.AI·19 May 2026

Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap

Signal

Hype

In three linesSix modern tabular foundation models form a highly redundant ensemble (mean Q-statistic 0.961). On 153 OpenML classification tasks, the best ensemble (two-level cascade stacking) gains +0.18% accuracy at 253× compute cost. Friedman-Nemenyi analysis places three ensembles and the best single model in the same equivalence group. Greedy selection is recommended as practical default.

Read source

Your take?

Benchmarks Papers

Summary generated by Claude — human-verified

Ensembling Tabular Foundation Models - A Diversity Ceiling And A Calibration Trap

Other angles on this story