Back to feed
arXiv cs.AI·

Multi-Dimensional Behavioral Evaluation of Agentic Stock Prediction Systems Using Large Language Model Judges with Closed-Loop Reinforcement Learning Feedback

Signal
78
Hype
15
In three linesBehavioral evaluation methodology for agentic AI systems: scoring intermediate decisions via LLM judge ensemble across 6 dimensions (regime detection, routing, adaptation, risk calibration, strategy coherence, error recovery). Behavioral score correlates at rho=0.72 with Sharpe ratio. Closed-loop reinforcement (SAC) reduces MAPE from 0.61% to 0.54% on 2017-2025 test set.
Read source
Your take?
AI AgentsReinforcement learningEvalsReasoning

Summary generated by Claude — human-verified