Back to feed
arXiv cs.CL·

DRInQ: Evaluating Conversational Implicature with Controlled Context Variation

Signal
72
Hype
18
In three linesDRInQ is a benchmark evaluating LLM pragmatic reasoning on conversational implicature. Researchers reveal a generation-inference asymmetry: models generate plausible pragmatic scenarios but fail to recover intended implications at inference time. Structured prompting improves alignment for smaller models.
Read source
Your take?
BenchmarksReasoningEvals

Summary generated by Claude — human-verified