Back to feed
arXiv cs.AI·

LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition

Signal
72
Hype
28
In three linesLC-ERD is a self-alignment framework for LLMs that mines latent logical structures via consistency-regulated reward decomposition. Addresses three challenges: label noise from mimetic bias, coarse-grained supervision, and distributional collapse. Uses Variational Logic Potential and multi-agent value decomposition based on IGM principle.
Read source
Your take?
ReasoningReinforcement learningAlignmentPapers

Summary generated by Claude — human-verified