Reddit r/MachineLearning·28 May 2026

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

Signal

Hype

In three linesMONET, an Apache 2.0 dataset of 104.9M high-quality images with captions and metadata, released on Hugging Face. Built from 2.9B images and refined. Includes paper, UMAP visualization, text/image retrieval tool, and codebase for training T2I models.

Read source

Your take?

Image generation Embeddings Open source Benchmarks

Summary generated by Claude — human-verified

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

Other angles on this story