arXiv cs.LG·22 May 2026

Provable Joint Decontamination for Benchmarking Multiple Large Language Models

Signal

Hype

In three linesJECS (Joint Envelope Conformal Selection) is a method to decontaminate LLM evaluation benchmarks by controlling global contamination rate (GCR) across multiple models. It aggregates per-model conformal p-values and applies adaptive Benjamini-Hochberg procedure to select a benchmark with provable fairness guarantees and higher power than baseline approaches.

Read source

Your take?

Benchmarks Evals AI safety

Summary generated by Claude — human-verified

Provable Joint Decontamination for Benchmarking Multiple Large Language Models

Other angles on this story