Back to feed
arXiv cs.AI·

SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

Signal
78
Hype
15
In three linesSCICONVBENCH benchmarks LLMs on multi-turn clarification of ill-posed scientific problems across fluid mechanics, solid mechanics, materials science, and PDEs. Best models resolve only 52.7% of disambiguation cases in fluid mechanics, but perform better on inconsistency detection. Evaluates clarification behavior, conversational grounding, and specification fidelity.
Read source
Your take?
BenchmarksReasoningCode generationPapers

Summary generated by Claude — human-verified