Reddit r/MachineLearning·2 June 2026

I scraped over 2 million job postings across 100,000+ company career sites into a unified, daily-updated dataset. [P]

Signal

Hype

In three linesA user built a large-scale scraping pipeline aggregating 2M+ active job postings from 100,000+ company career sites. Dataset in Parquet format, daily-refreshed, freely accessible with standard fields (title, company, description, location, URL).

Read source

Your take?

Tools Infrastructure Open source

Summary generated by Claude — human-verified

I scraped over 2 million job postings across 100,000+ company career sites into a unified, daily-updated dataset. [P]

Other angles on this story