Back to feed
arXiv cs.AI·

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

Signal
75
Hype
15
In three linesQSTRBench is a benchmark evaluating LLMs' ability to reason with qualitative spatial and temporal reasoning (QSTR). It covers 9 calculi (Point Algebra, Allen's Interval Algebra, RCC-5/8/22, etc.) with composition tables, converse relations, and conceptual neighbourhoods. Tested models outperform guessing but none answer all questions correctly. RCC-22 proves most difficult.
Read source
Your take?
BenchmarksReasoningEvals

Summary generated by Claude — human-verified