Canonical Data Layer · Job Pool

Canonical job data infrastructure for the Job Pool ecosystem.

JobDataPool hosts the public API, OpenAPI contract, JSON Schemas, versioned datasets, and RFCs that consumer products, transparency surfaces, and contributor tooling read from. JPE-RFC-0001 and JPE-RFC-0002 draw the boundaries — ingestion, normalization, and provenance live here so job sites, transparency tools, and contributor pipelines do not each reinvent the same plumbing.

Canonical endpoint https://api.jobdatapool.com/v1/jobs
Compatibility alias https://jobdatapool.com/api/jobs
Contracts /openapi.json /schemas/*

HTTP surface

A small API with contracts you can codegen against

Canonical routes sit on /v1/*; older integrations can keep using /api/*. Every shape worth trusting is spelled out in OpenAPI and JSON Schema.

What runs where

Path Role Serving Layer
/api/* and /v1/*Lightweight HTTP APINetlify Functions plus edge controls
/schemas/*Machine-readable contractsStatic hosting
/datasets/*Bulk and snapshot accessStatic redirects and download surface
/rfc/*Docs hub and architecture recordsStatic docs

Request contract

Field Type Notes
limitnumber1-50, defaults to 25
industriescsv or arrayCase-insensitive partial match
country_codestring2-letter ISO style code, example US
distinct_bystringOptional dedupe key: job_title, apply_link, or url
authnoneNo auth in the v1 surface
curl -s "https://api.jobdatapool.com/v1/jobs?limit=5&country_code=US"

curl -s "https://api.jobdatapool.com/v1/health"
curl -s "https://api.jobdatapool.com/v1/sources"

Try it

The playground tries the canonical API host first, then falls back to on-site aliases if needed.

Ready.

[]

For builders

Launch a job product without rebuilding the data plane

JobDataPool handles listings, schemas, and provenance. Focus your product on audience, ranking, editorial UX, and apply-flow conversion.

01

Prototype against live data

Query /v1/jobs for listings and /v1/sources for source metadata. Point staging at the compatibility host until you are ready to cut over to the canonical API.

02

Lock shapes with contracts

Generate clients, tests, and tooling from /openapi.json and /schemas/job-listing.schema.json so staging and production share the same validated shapes.

03

Bootstrap from snapshots

Use versioned CSVs for demos, notebooks, and one-off imports. Use the API for current listings and ongoing sync.

04

Respect domain boundaries

Do not mirror consumer SEO pages from mewannajob.com. Build your own search and editorial surface, and integrate against JobDataPool for the data contract.

Documentation

Everything worth citing lives in one docs hub

API reference, publisher discovery signals, crawl etiquette, agent instructions, and the RFCs that explain why the pool is split across domains.

Start with contracts

Read the API, source catalog, schemas, and CSV releases before building a custom crawler.

Read API docs

Signal from your job board

Publishers can mark preferred access via robots comments, Link headers, HTML tags, verification tokens, and feed metadata.

Read publisher docs

Responsible crawling

Prefer cached structured access, identifiable user agents, visible attribution, and conservative refresh rates.

Read access policy

Coding agent instructions

Cursor, Claude, and Codex use one shared instruction set for this repository.

Open agent docs

Community

Contribute to data quality and governance

Public dashboards have logged 38,822+ requests across the ecosystem. Ongoing work needs human review: source vetting, schema maintenance, listing quality, and a forum where builders can coordinate.

Launch Signal

38,822+ requests so far

1,294avg/day days 3sites

Fetching the latest public counter…

Community

Discuss on r/jobdatapool

Source reports, listing quality issues, RFC discussion, launch updates, and builder coordination.

Review committee

Join the review committee

Volunteer to vet sources, review schema and RFC changes, validate listing provenance, improve contributor workflows, or help with community moderation.

Bulk data

Monthly CSV releases with DVC receipts

We discover versions from DVC pointers in the Job Pool data repo, then resolve them to download URLs. Same lineage mewannajob.com and jobpool.live use — offered here as a plain download table.

Schema

One listing shape, many surfaces

The fields below mirror the live CSV contract. The JSON Schema is the authoritative machine-readable copy.

Field Typical Type Description
idstringStable record identifier
ingestion_datedate stringFirst ingest date (YYYY-MM-DD)
job_titlestringRole title
company_namestringHiring company
job_locationstringListed location text
job_seniority_levelstringSeniority tag
job_employment_typestringEmployment type
job_industriesstring or array-like textIndustry tags
job_summarystringSummary text
job_base_pay_rangestringCompensation text
job_posted_datedate stringPosting date
competitiveness_scorenumber/stringOptional model score
skillsstringSkill list text
certificationsstringCertification text
industriesstringAlternate industries field
achievementsstringOptional metadata
urlurl stringSource URL
apply_linkurl stringApplication URL
country_codestringCountry code
ingest_utc_datedate stringPipeline ingest date
ingest_utc_hournumber/stringPipeline ingest hour
yearnumber/stringDataset release year
monthnumber/stringDataset release month (1–12)
validated_ondate stringLast validation date (YYYY-MM-DD)
listing_closedboolean/stringClosed flag from latest validation
posting_urlurl stringMasked redirect URL on API responses
source_business_urlurl stringBusiness/source site when supplied upstream

Trust · RFC-0001

Verifiable trust signals

JPE-RFC-0001 defines provenance, freshness, source-aware records, structured contracts, and controlled openness as requirements. JobDataPool publishes the contracts, domain boundaries, and operational metrics needed to verify those claims.

Provenance in every record

Each listing includes url, apply_link, source_business_url, ingest_utc_date, and ingest_utc_hour for tracing origin and lifecycle.

Contracts you can generate from

The OpenAPI 3.1 spec and JSON Schemas are public and versioned by URL. Point SDKs, agents, and CI at those files instead of parsing page HTML.

Freshness you can see

Ingest timestamps and /v1/launch-metrics show how recently the pool was exercised. The usage panel shares only coarse daily totals — never IPs, never paths.

Open, with guardrails

v1 stays no-auth but rate-limited: 5 requests per 60s, 50 listings per call, 200 listings per minute per IP/origin, plus cache-friendly headers, per RFC-0001.

Explicit domain roles

mewannajob.com and thehiringcafe.com are for seekers. jobpool.live is for transparency. datapool.work runs ingestion. jobdatapool.com holds the canonical data. RFC-0002 draws the lines.

Public architecture records

The RFC series documents why the ecosystem is split across domains. Stable URLs, a public changelog, and r/jobdatapool support open design discussion.

Sibling sites

Related domains, separate responsibilities

JobDataPool stays infrastructure-first. Each linked domain owns a different slice of the experience.

mewannajob.com — for job seekers

  • Job search and discovery built on pooled listings data.
  • CSV Lab for poking at releases on desktop and mobile.
  • Resume and fit workflows for people actually applying.

Ecosystem Topology

Normative flow from ingestion to consumer UX

Matches JPE-RFC-0001 domain responsibilities and JPE-RFC-0002 linking strategy.

datapool.work

Ingestion Operations

Scraper orchestration, contributor ops, moderation workflows.

jobdatapool.com

Canonical Infrastructure

API, schemas, versioned datasets, documented contracts.

jobpool.live

Transparency Layer

Bulk downloads, scraper docs, contributor visibility.

mewannajob.com

Consumer Product

Job search, alerts, and conversion outcomes.

FAQ

Frequently asked questions about JobDataPool

Quick answers for builders, researchers, and AI agents evaluating the Job Pool ecosystem. For a deeper read, the docs hub, JPE-RFC-0001, and JPE-RFC-0002 are the source of truth.

What is JobDataPool?

JobDataPool is the canonical data infrastructure layer of the Job Pool ecosystem. It hosts the public /v1 REST API, OpenAPI 3.1 contract, JSON Schemas, versioned CSV datasets, and architecture RFCs that downstream consumer products, transparency surfaces, and contributor tooling read from.

Is the JobDataPool API free and public?

Yes. The /v1 API is public and no-auth, rate-limited per IP and origin (5 requests per 60 seconds, 50 listings per request, 200 listings per 60 seconds). Endpoints include /v1/jobs, /v1/sources, /v1/launch-metrics, and /v1/health. Cache aggressively — canonical responses are immutable for the duration of their Cache-Control header.

How is JobDataPool different from mewannajob.com, thehiringcafe.com, and jobpool.live?

JPE-RFC-0002 splits the Job Pool ecosystem across four cooperating domains: mewannajob.com and thehiringcafe.com are the consumer job-search products, jobpool.live is the transparency layer with bulk downloads and scraper outputs, datapool.work runs ingestion operations, and jobdatapool.com is the canonical data infrastructure where the API, schemas, and reviewed datasets live.

How do I become a job data source for the pool?

Three documented integration paths: expose a structured feed for JobPoolBot to crawl, submit a reviewed scraper on jobpool.live/docs/submissions, or upload a CSV that conforms to job-listing.schema.json. The full submission guide lives at /docs/data-sources/.

Is the job listings dataset openly downloadable?

Yes. Versioned CSV snapshots are published openly under /datasets/, with DVC pointers and source metadata exposed via /v1/sources. Use the API for live query behavior and the CSVs for bulk bootstrap, reproducible demos, or offline analysis.

Can my job board publish a preferred-access signal toward JobDataPool?

Yes. Publishers expose preferred-access signals through robots.txt comments, HTTP Link headers (RFC 8288), HTML metadata, verification tokens, and feed descriptors. JobDataPool is positioned as a cooperative structured alternative — never as exclusive access, mandatory usage, or an authorization mechanism. See /rfc/#publishers for the full pattern.

Review Committee Intake

Help steward JobDataPool

Submissions use the same Firebase configuration as TheHiringCafe when Firebase is enabled. Google sign-in may be required before submit so the review queue has a verified contact.

I can help with

Ready.