arXiv cs.LG·27 May 2026

The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models

Signal

Hype

In three linesStudy on the cost of structured output constraints for small language models (< 3B). Tests on Qwen2.5-0.5B/1.5B and SmolLM2-1.7B show that enforcing JSON schema validity (61.5% → 100%) reduces answer accuracy (19.7% → 11.0%) and increases semantically invalid outputs (49.5% → 88.9%). Recommendation: report schema validity, answer accuracy, and semantic error rates separately.

Read source

Your take?

Qwen Code generation Evals Benchmarks

Summary generated by Claude — human-verified

The Constraint Tax: Measuring Validity-Correctness Tradeoffs in Structured Outputs for Small Language Models

Other angles on this story