arXiv cs.CL·1 June 2026

COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models

Signal

Hype

In three linesCOFT is a training-free decoding method that reduces biases in LLM chain-of-thought generation. It uses masked counterfactual prompts and logit fusion to attenuate attribute-driven biases, with distribution-free marginal validity guarantees. Evaluation across 6 models: 30-55% bias reduction (median 38%) with negligible utility loss and ≤11% computational overhead.

Read source

Your take?

Reasoning AI safety Alignment Benchmarks

Summary generated by Claude — human-verified

COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models

Other angles on this story