Back to feed
arXiv cs.CL·

Generalistic or Specific Embeddings, Which is Better? An Empirical Study on Search for Clinical Coding in Non-English Languages

Signal
78
Hype
15
In three linesComparative study of generic vs domain-specific embeddings for multilingual clinical search (ICD-10-CM). A bi-encoder fine-tuned on Gemini-generated synthetic data (6 languages) outperforms BioBERT-ST: R@5=0.822 vs 0.790, with major gains in Portuguese (+0.115). Open recipe for LLM-based medical retrievers.
Read source
Your take?
EmbeddingsRAGBenchmarks

Summary generated by Claude — human-verified