Back to feed
arXiv cs.CL·

Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

Signal
78
Hype
15
In three linesRetrieval-based approach for multi-label legal annotation: frozen embedding model with k-NN prediction on documents and label descriptions. On Eurlex (100 labels), Qwen-8B achieves Macro-F1 49.12 vs 40.41 (GPT-5.2 zero-shot), reduces compute 20-30×. Eliminates hallucinations (GPT-5.2: 0.12-0.9% out-of-taxonomy labels).
Read source
Your take?
RAGEmbeddingsBenchmarksEvals

Summary generated by Claude — human-verified