Back to feed
arXiv cs.CL·

Disentangling Language Roles in Multilingual LLM Task Execution

Signal
78
Hype
15
In three linesMTM-Bench, a controlled benchmark for multilingual task execution, evaluates 20 LLMs across 27 language triplets (instruction/content/response) in English, Spanish, and Chinese. Results show degradation is organized by language role in task structure, with response language as the dominant axis of variation.
Read source
Your take?
BenchmarksEvals

Summary generated by Claude — human-verified