EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation Extraction
Signal
78
Hype
15
In three linesEURO-5K is a 5K-sentence corpus for extracting reporting obligations from EU legislation (136 legislative acts). Comparison of fine-tuned BERT and LLMs (QLoRA): generic and legal BERT achieve similar 0.89 F1; legal pretraining helps mainly for parameter-efficient tuning. Convergence at 3K samples.Read source
Your take?
Summary generated by Claude — human-verified