Curation and Extraction of Drug-Related Entities from Reddit Platform
Signal
72
Hype
15
In three linesReDose is a dataset of 6,435 Reddit posts annotated by toxicologists to extract DRUG, DOSE, and EFFECT entities. BiomedBERT achieves F1=0.843 for DRUG; Llama-3 70B outperforms GPT-4 (F1=0.79 vs 0.72). EFFECT extraction remains challenging (GPT-4 recall=0.41).Read source
Your take?
Summary generated by Claude — human-verified