AI RESEARCH

I scraped over 2 million job postings across 100,000+ company career sites into a unified, daily-updated dataset. [P]

r/MachineLearning

Over the past few months, I've been working on a high-scale scraping pipeline to aggregate listings directly from company job boards and applicant tracking systems. Mapping over 100,000 distinct companies to their career pages turned out to be a massive engineering headache, but it's finally stable. The result is a unified database of than 2M active job postings, which I'm opening up to everyone for free. I am running daily delta refreshes to keep it current. Dataset Overview Scale: 2M+ active job listings across 100,000+ unique companies. Format: Parquet.