Back to feed
arXiv cs.CL·

SEA-Embedding: Open and Reproducible Text Embeddings for Southeast Asia

Signal
78
Hype
15
In three linesSEA-Embedding is an open and reproducible text-embedding pipeline for Southeast Asian languages trained exclusively on public data. The study examines three core factors: data composition, training objective, and base encoder initialization. Achieves state-of-the-art results on SEA-BED.
Read source
Your take?
EmbeddingsOpen sourcePapersBenchmarks

Summary generated by Claude — human-verified