arXiv cs.AI·19 May 2026

Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech

Signal

Hype

In three linesA lightweight transformer predicts robot co-speech iconic gestures from text and emotion alone, without audio at inference time. The model outperforms GPT-4o on semantic gesture placement classification and intensity regression on the BEAT2 dataset, while remaining computationally compact for real-time embodied agent deployment.

Read source

Your take?

Robotics Reasoning Benchmarks

Summary generated by Claude — human-verified

Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech

Other angles on this story