arXiv cs.AI·19 May 2026

CAREBench: Evaluating LLMs' Emotion Understanding by Assessing Cognitive Appraisal Reasoning

Signal

Hype

In three linesCAREBench is a benchmark evaluating LLMs' emotion understanding through cognitive appraisal reasoning. Tested on 6 models with complete inferential chain annotations (first/third-person perspectives), it shows stronger models match humans on some tasks but fall short on appraisal reasoning and positive emotion recognition.

Read source

Your take?

Benchmarks Evals Reasoning

Summary generated by Claude — human-verified

CAREBench: Evaluating LLMs' Emotion Understanding by Assessing Cognitive Appraisal Reasoning

Other angles on this story