Back to feed
arXiv cs.LG·

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

Signal
72
Hype
15
In three linesarXiv paper proposing an online aggregation mechanism to align LLMs with human feedback in mobile crowdsourcing. The system incentivizes truthful preference reporting from strategic workers via a dynamic Bayesian game, reducing regret from O(T) to O(√T) over T time slots.
Read source
Your take?
Fine-tuningReinforcement learningPapers

Summary generated by Claude — human-verified