Versioned Datasets · Open & Reproducible

Download open job listings CSV datasets — pinned by month, built for reproducible work

JobDataPool ships the Job Pool’s reviewed listings as monthly CSV snapshots you can actually cite: same rows the ecosystem consumes, same JSON Schema as /v1/jobs, and a DVC pointer for every release so your notebook does not drift when the pool moves forward. Grab latest.csv when you want the current month; scroll the catalog when you need May, April, or an exact byte-for-byte freeze.

Version catalog

Rows are built from DVC pointers in jobpool-listings-r2: each *.csv.dvc path: is the canonical CSV URL; sibling artifacts on the same R2 prefix (JSON, Parquet, API JSON, gzip) live under More formats on each row. Older months keep archive paths from their DVC files.

Loading version catalog…

Release Primary file Published Size Download
Loading version catalog…

/datasets/latest.csv always redirects to the current reviewed month. Need machine-readable metadata? See /v1/sources.

Start here

Three entry points most builders use on day one.

# Always fetch the current reviewed month
curl -L -o listings-latest.csv https://jobdatapool.com/datasets/latest.csv

# Pin June 2026 explicitly
curl -L -o listings-june-2026.csv \
  "https://pub-e2c96b2fef074ee0809919335319632f.r2.dev/listings-june-2026.csv"

# Machine-readable catalog (DVC URLs, schemas, freshness)
curl -s https://api.jobdatapool.com/v1/sources

# Live slice without downloading the full CSV
curl -s "https://api.jobdatapool.com/v1/jobs?limit=25&country_code=US"

What you get in every CSV

These are not scraped HTML dumps. Each row is a structured listing validated against job-listing.schema.json and aligned with JPE-RFC-0001 trust fields: employer context, title, location signals, url, apply_link, ingest timestamps, and listing_closed when a role has gone stale.

  • Monthly versioning — stable filenames (listings-YYYY-month.csv) plus DVC receipts.
  • Reviewed before publish — staged intake documented at /docs/data-sources/.
  • Attribution preserved — provenance columns stay populated for downstream audits.
  • Schema parity with the API — bulk and live paths share one contract.
  • No API key for CSV — fair-use limits apply to high-volume /v1/jobs query traffic, not your first download.

How teams use these files

Warehouse bootstrap

Land latest.csv in Snowflake, BigQuery, or Postgres, validate against the schema, and treat JobDataPool as an external dimension table for hiring-market analytics.

ML & ranking features

Train role classifiers, geography normalizers, or fit models on a frozen month. Pin the DVC pointer in your experiment config so reruns stay honest.

Hybrid live + bulk

Import a snapshot, then poll /v1/jobs for deltas. /v1/sources tells you when a new monthly drop landed without re-scraping the catalog page.

CI contract tests

Assert incoming partner feeds still match JSON Schemas. Fail builds when columns drift—before bad rows reach production dashboards.

Common questions

How is this different from scraping job boards myself?

You inherit a pooled, reviewed normalization pass: deduped IDs, consistent columns, closed-listing flags, and source URLs that survived validation. You spend time on analysis, not re-building the same ETL every month.

Which file should I cite in a paper or notebook?

Cite the specific month’s CSV filename and its DVC pointer URL from the catalog above. Use latest.csv only when you explicitly want a moving target.

Can I use this with mewannajob or jobpool.live?

Yes. Consumer surfaces in the Job Pool ecosystem read from the same lineage. This page is the canonical download front door; transparency tools may expose additional contributor-facing slices.