Back to feed
arXiv cs.LG·

Provable Joint Decontamination for Benchmarking Multiple Large Language Models

Signal
78
Hype
15
In three linesJECS (Joint Envelope Conformal Selection) is a method to decontaminate LLM evaluation benchmarks by controlling global contamination rate (GCR) across multiple models. It aggregates per-model conformal p-values and applies adaptive Benjamini-Hochberg procedure to select a benchmark with provable fairness guarantees and higher power than baseline approaches.
Read source
Your take?
BenchmarksEvalsAI safety

Summary generated by Claude — human-verified