arXiv cs.AI·19 May 2026

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

Signal

Hype

In three linesQSTRBench is a benchmark evaluating LLMs' ability to reason with qualitative spatial and temporal reasoning (QSTR). It covers 9 calculi (Point Algebra, Allen's Interval Algebra, RCC-5/8/22, etc.) with composition tables, converse relations, and conceptual neighbourhoods. Tested models outperform guessing but none answer all questions correctly. RCC-22 proves most difficult.

Read source

Your take?

Benchmarks Reasoning Evals

Summary generated by Claude — human-verified

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

Other angles on this story