Back to feed
arXiv cs.LG·

Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback

Signal
78
Hype
08
In three linesStudy of adversarial online learning on hidden-convex losses (nonconvex losses becoming convex after reparameterization). Authors prove online gradient descent (OGD) achieves optimal Θ(√T) regret, improving prior O(T^2/3) result. They characterize necessary-and-sufficient Hessian compatibility condition and extend analysis to bandit feedback with O(T^3/4) regret.
Read source
Your take?
PapersReinforcement learningBenchmarks

Summary generated by Claude — human-verified