Disentangling Language Roles in Multilingual LLM Task Execution
Signal
78
Hype
15
In three linesMTM-Bench, a controlled benchmark for multilingual task execution, evaluates 20 LLMs across 27 language triplets (instruction/content/response) in English, Spanish, and Chinese. Results show degradation is organized by language role in task structure, with response language as the dominant axis of variation.Read source
Your take?
Summary generated by Claude — human-verified