SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science
Signal
78
Hype
15
In three linesSCICONVBENCH benchmarks LLMs on multi-turn clarification of ill-posed scientific problems across fluid mechanics, solid mechanics, materials science, and PDEs. Best models resolve only 52.7% of disambiguation cases in fluid mechanics, but perform better on inconsistency detection. Evaluates clarification behavior, conversational grounding, and specification fidelity.Read source
Your take?
Summary generated by Claude — human-verified