Back to feed
Reddit r/MachineLearning·

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

Signal
75
Hype
25
In three linesMONET, an Apache 2.0 dataset of 104.9M high-quality images with captions and metadata, released on Hugging Face. Built from 2.9B images and refined. Includes paper, UMAP visualization, text/image retrieval tool, and codebase for training T2I models.
Read source
Your take?
Image generationEmbeddingsOpen sourceBenchmarks

Summary generated by Claude — human-verified