Back to feed
arXiv cs.CL·

COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models

Signal
78
Hype
15
In three linesCOFT is a training-free decoding method that reduces biases in LLM chain-of-thought generation. It uses masked counterfactual prompts and logit fusion to attenuate attribute-driven biases, with distribution-free marginal validity guarantees. Evaluation across 6 models: 30-55% bias reduction (median 38%) with negligible utility loss and ≤11% computational overhead.
Read source
Your take?
ReasoningAI safetyAlignmentBenchmarks

Summary generated by Claude — human-verified