Back to feed
arXiv cs.LG·

It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

Signal
72
Hype
28
In three linesSELFCI is a complementary self-distillation framework optimizing two independent reverse KL divergences to align LLMs with Contextual Integrity (CI). The system preserves task-relevant information while minimizing inappropriate disclosures, without costly external supervision, outperforming GRPO and other baselines.
Read source
Your take?
Reinforcement learningAlignmentAI safetyPapers

Summary generated by Claude — human-verified