Back to feed
arXiv cs.CL·

TeachObs: A Human-Validated Benchmark for Multimodal Teaching Observation and Model Evaluation

Signal
78
Hype
15
In three linesTeachObs is a human-validated multimodal benchmark for classroom video analysis. It contains 30 public lessons from 8 countries split into 5,158 15-second scenes, annotated by 7 researchers with 39 observation codes (20 visual, 19 non-visual). Evaluation of 5 vision-capable LLMs across 3 tasks: no single model consistently outperforms others.
Read source
Your take?
BenchmarksVisionEvalsPapers

Summary generated by Claude — human-verified