DRInQ: Evaluating Conversational Implicature with Controlled Context Variation
Signal
72
Hype
18
In three linesDRInQ is a benchmark evaluating LLM pragmatic reasoning on conversational implicature. Researchers reveal a generation-inference asymmetry: models generate plausible pragmatic scenarios but fail to recover intended implications at inference time. Structured prompting improves alignment for smaller models.Read source
Your take?
Summary generated by Claude — human-verified