Back to feed
arXiv cs.CL·

UA-Legal-Bench: A Benchmark for Evaluating Large Language Models on Ukrainian Legal Reasoning

Signal
78
Hype
15
In three linesUA-Legal-Bench evaluates 11 LLMs (3B–675B) on 5 Ukrainian legal reasoning tasks from 99.5M court decisions. Results show task-dependent few-shot effects: +38.6 pp improvement for judgment form classification, but mixed effects on outcome prediction. Accuracy is misleading on imbalanced tasks: highest accuracy model (62%) is a majority-class predictor (macro-F1: 23%).
Read source
Your take?
BenchmarksEvalsPapers

Summary generated by Claude — human-verified