Download open job listings CSV datasets — pinned by month, built for reproducible work
JobDataPool ships the Job Pool’s reviewed listings as monthly CSV snapshots you can actually cite: same rows the ecosystem consumes, same JSON Schema as /v1/jobs, and a DVC pointer for every release so your notebook does not drift when the pool moves forward.
Grab latest.csv when you want the current month; scroll the catalog when you need May, April, or an exact byte-for-byte freeze.
Rows are built from DVC pointers in
jobpool-listings-r2:
each *.csv.dvcpath: is the canonical CSV URL; sibling artifacts on the same R2 prefix
(JSON, Parquet, API JSON, gzip) live under More formats on each row. Older months keep archive paths from their DVC files.
Loading version catalog…
Release
Primary file
Published
Size
Download
Loading version catalog…
/datasets/latest.csv always redirects to the current reviewed month.
Need machine-readable metadata? See /v1/sources.
# Always fetch the current reviewed month
curl -L -o listings-latest.csv https://jobdatapool.com/datasets/latest.csv
# Pin June 2026 explicitly
curl -L -o listings-june-2026.csv \
"https://pub-e2c96b2fef074ee0809919335319632f.r2.dev/listings-june-2026.csv"
# Machine-readable catalog (DVC URLs, schemas, freshness)
curl -s https://api.jobdatapool.com/v1/sources
# Live slice without downloading the full CSV
curl -s "https://api.jobdatapool.com/v1/jobs?limit=25&country_code=US"
What you get in every CSV
These are not scraped HTML dumps. Each row is a structured listing validated against
job-listing.schema.json and aligned with
JPE-RFC-0001 trust fields: employer context, title, location signals,
url, apply_link, ingest timestamps, and listing_closed when a role has gone stale.
Monthly versioning — stable filenames (listings-YYYY-month.csv) plus DVC receipts.
Attribution preserved — provenance columns stay populated for downstream audits.
Schema parity with the API — bulk and live paths share one contract.
No API key for CSV — fair-use limits apply to high-volume /v1/jobs query traffic, not your first download.
How teams use these files
Warehouse bootstrap
Land latest.csv in Snowflake, BigQuery, or Postgres, validate against the schema, and treat JobDataPool as an external dimension table for hiring-market analytics.
ML & ranking features
Train role classifiers, geography normalizers, or fit models on a frozen month. Pin the DVC pointer in your experiment config so reruns stay honest.
Hybrid live + bulk
Import a snapshot, then poll /v1/jobs for deltas. /v1/sources tells you when a new monthly drop landed without re-scraping the catalog page.
CI contract tests
Assert incoming partner feeds still match JSON Schemas. Fail builds when columns drift—before bad rows reach production dashboards.
Common questions
How is this different from scraping job boards myself?
You inherit a pooled, reviewed normalization pass: deduped IDs, consistent columns, closed-listing flags, and source URLs that survived validation. You spend time on analysis, not re-building the same ETL every month.
Which file should I cite in a paper or notebook?
Cite the specific month’s CSV filename and its DVC pointer URL from the catalog above. Use latest.csv only when you explicitly want a moving target.
Can I use this with mewannajob or jobpool.live?
Yes. Consumer surfaces in the Job Pool ecosystem read from the same lineage. This page is the canonical download front door; transparency tools may expose additional contributor-facing slices.