JobDataPool is the canonical data infrastructure layer of the Job Pool ecosystem. It hosts the public /v1 REST API, OpenAPI 3.1 contract, JSON Schemas, versioned CSV datasets, and architecture RFCs.

How should job board publishers integrate with JobDataPool?

Publishers expose preferred-access signals through robots.txt comments, HTTP Link headers (RFC 8288), HTML metadata, verification tokens, and feed descriptors that point responsible consumers toward cached structured data. Search engine crawlers continue to behave normally.

Is JobDataPool an exclusive access layer?

No. JobDataPool is a cooperative structured alternative and traffic offload layer — not exclusive access, mandatory usage, or an authorization mechanism. Robots rules are not access authorization.

What does JPE-RFC-0001 establish?

JPE-RFC-0001 defines the Open Job Data Pool model: shared, structured, continuously refreshed job data infrastructure with provenance, freshness, and controlled openness as first-class requirements. It assigns canonical-data-domain status to jobdatapool.com.

What does JPE-RFC-0002 establish?

JPE-RFC-0002 defines the Job Pool web topology: domain inventory, role separation across mewannajob.com, jobpool.live, jobdatapool.com, and datapool.work, plus the cross-domain linking strategy that preserves SEO boundaries.

Developer Docs & RFCs

Job data API docs for builders and publishers.

This is the contract desk for the Job Pool ecosystem: JobDataPool API usage, OpenAPI and schema references, canonical dataset paths, publisher integration signals, responsible crawling policy, and the RFCs that explain why the system is split across several domains.

Contact the team Map the system Use the API Use the MCP server Publisher signals Submit a data source Agent configs Architecture RFCs

Table of Contents

Find the right contract fast.

Contact the team: Email the team.
Map the system: Domain roles.
Use the API: API and CSV paths.
Submit a data source: Feed, scraper, or CSV.
Publisher signals: Preferred access hints.
Access policy: Cooperative terms.
FAQ: Fast answers.
Agent context: Shared repo rules.
Architecture RFCs: Durable records.

Contact

Contact the team

If you’re wanting to get in touch, send an email to team@jobdatapool.com.

Surface Map

Four domains, one job data system

The topology is both a product boundary and an SEO boundary. Each site answers a different search intent, then hands users to the next surface when they need more depth.

01

`mewannajob.com`

Consumer product for job seekers: ChatGPT apply plans, mobile quick search, desktop dashboards, Q&A, and outcome-first browsing.

02

`jobdatapool.com`

Canonical infrastructure for developers: job data API docs, OpenAPI, JSON schemas, versioned datasets, publisher docs, and RFCs.

03

`jobpool.live`

Transparency layer for power users: reviewed downloads, scraper docs, staged submissions, review gates, leaderboards, and freshness signals.

04

`datapool.work`

Operations layer for contributors: source submission, scraper orchestration, moderation, review tooling, and ingestion workflows.

For API Users

Use the canonical job data API before crawling origins

JobDataPool is the structured first stop for builders, researchers, aggregators, and downstream job products. Reach for the cache first, keep attribution intact, and make origin crawling the exception instead of the default.

Cache-first workflow

Query /v1/jobs for current listings, with limit at or below 50.
Read /v1/sources before importing so your system knows the dataset repository, DVC pointer, and CSV origin.
Use /datasets/latest.csv when you need a bulk bootstrap, reproducible demo, analytics run, or offline QA fixture.
Carry attribution forward: url, apply_link, source_business_url, and ingest timestamps are part of the trust layer.
Touch origin job boards only when the canonical cache is missing, stale, or not detailed enough for the task.

Copy-ready requests

curl -s "https://api.jobdatapool.com/v1/jobs?limit=25&country_code=US"
curl -s "https://api.jobdatapool.com/v1/sources"
curl -L "https://jobdatapool.com/datasets/latest.csv"

Use the API for live query behavior. Use CSV snapshots when you need stable input files.

Endpoint quick map

Endpoint	Use it for	Notes
`/v1/jobs`	Current job listing queries	No auth in v1, 15 requests per 60s, 50 listings per request, 200 listings per 60s per IP/origin, `limit` capped at 50.
`/v1/sources`	Canonical source catalog	Repository, DVC pointer, R2 CSV URL, schema, and endpoint references.
`/datasets/latest.csv`	Reviewed dataset bootstrap	Redirects to the latest reviewed CSV snapshot for bulk jobs data workflows.
`/openapi.json`	Contract automation	Use for SDKs, tests, integration checks, docs generation, and agent context.

For Data Sources

Submit a job data source via jobpool.live

If your site is the origin for job listings — a board, an ATS feed, a careers page, or a periodic dataset — you can plug into the pool through three documented integration paths. Reviewed scrapers, feed crawls by JobPoolBot, and direct dataset uploads all flow through the same staged review queue.

Three integration paths

Expose a structured feed (JSON, RSS, Atom) for JobPoolBot to crawl on a paced schedule.
Submit a reviewed scraper on jobpool.live/docs/scrapers when there is no first-party feed.
Upload a CSV dataset that matches job-listing.schema.json for backfills or periodic exports.

Full guide: jobdatapool.com/docs/data-sources/. Operational submission UI: jobpool.live/docs/submissions.

Reference agent

JobPoolBot is the canonical data-source agent. It identifies as JobPoolBot/1.0 (+https://jobpool.live/jobpoolbot), honors robots.txt with Crawl-delay, prefers structured feeds, and routes ingested rows through the staged review queue before they appear in /v1/jobs.

Read the data-source docs JobPoolBot as a product UA operator info

For Job Board Publishers

Publish signals that reduce duplicate crawler traffic

Publisher integration should be transparent, RFC-compatible, and non-exclusive. The useful message is simple: your job board can point responsible consumers toward a structured JobDataPool feed before they repeatedly crawl origin pages.

Publisher checklist

Keep normal robots.txt, sitemap, and search crawler behavior intact.
Add comments that name your preferred structured cache or publisher feed.
Expose a standard HTTP Link header with rel="alternate" and type="application/json".
Add optional HTML metadata for CMS templates and crawlers that already parse page heads.
Prepare verification through DNS, /.well-known/, or an HTML metadata tag.

Robots example

# /robots.txt
User-agent: JobPoolBot
Allow: /

# Preferred structured cache:
# https://jobdatapool.com/v1/sources
# Try the cache/API before re-crawling origin pages.
Sitemap: https://example.com/sitemap.xml

The public crawler identifies as JobPoolBot/1.0 (+https://jobpool.live/jobpoolbot). Use JobPoolBot as the robots product token; dotted domains belong in links and comments, not in the User-agent value.

Machine-readable Signals

Discovery patterns for job board publisher feeds

These signals are small on purpose. A publisher should be able to add them without a platform migration, and a responsible data consumer should be able to find the structured feed without guessing.

HTTP discovery

Use the standard Link header first. Treat custom headers as optional hints for clients that know JobDataPool.

Link: <https://jobdatapool.com/v1/sources>; rel="alternate"; type="application/json"
X-JobDataPool-Canonical: https://jobdatapool.com/v1/sources

HTML metadata

Add page-head hints for CMS templates, job detail pages, and crawlers already parsing HTML.

<link rel="alternate"
  type="application/json"
  href="https://jobdatapool.com/v1/sources">
<meta name="jobdatapool-canonical"
  content="https://jobdatapool.com/v1/sources">

Verification

Use one token across DNS, metadata, or /.well-known/ so publisher ownership can be confirmed without redesigning job pages.

jobdatapool-verification=abc123

/.well-known/jobdatapool-verification
<meta name="jobdatapool-verification" content="abc123">

Feed descriptor

Expose a compact descriptor for publisher identity, freshness, usage context, and origin attribution.

{
  "publisher": "ExampleBoard",
  "origin": "https://example.com",
  "canonical_feed": "https://jobdatapool.com/v1/sources",
  "refresh_interval": "PT5M",
  "allowed_usage": ["aggregation", "analytics", "search"]
}

Preferred access

Use JobDataPool APIs or publisher feeds first.
Refresh origin job pages only when records are missing or stale.
Respect publisher rate limits, cache headers, and crawl-delay hints where supplied.
Preserve source attribution, application URLs, and publisher context.

Standards notes

RFC 9309 defines robots access rules, but robots rules are not access authorization.
RFC 8288 is the right home for typed alternate links.
Use comments instead of inventing unsupported robots.txt directives.
Use valid robots product tokens such as JobPoolBot, not dotted domains.

Access Policy

A pressure valve, not a gate

The positioning should stay standards-oriented and publisher-friendly. JobDataPool is a structured alternative and traffic offload layer, not a gatekeeper, authorization system, or exclusive access claim over public job data.

Use this language

preferred access
structured alternative
canonical cache
publisher-endorsed feed
responsible crawling

Avoid this language

exclusive access
mandatory usage
blocking competitors
anti-scraping coordination
robots.txt as enforcement or authorization

Developer FAQ

Fast answers for searchers and implementers

These are the questions this docs hub should answer before anyone has to spelunk through the repo or reverse-engineer the domain topology.

What is JobDataPool?

JobDataPool is the canonical infrastructure surface for Job Pool APIs, schemas, OpenAPI contracts, versioned datasets, documentation, and architecture RFCs.

How should publishers integrate?

Job board publishers can expose preferred access signals through robots comments, HTTP Link headers, HTML metadata, verification tokens, and feed descriptors.

Is JobDataPool exclusive?

No. JobDataPool is a cooperative structured alternative and traffic offload layer, not mandatory usage, authorization, or exclusive access to public job listings.

Agent Context

Cursor, Claude, and Codex read from the same source

Developer agents should not reconstruct the ecosystem from stray page copy. The repo includes tool-specific entry files that point to one markdown instruction set for domain roles, SEO boundaries, API contracts, publisher integration, and implementation rules.

Instruction index

Markdown playbooks in docs/agent-instructions/ cover monthly DVC/R2 rollouts, the /datasets/ catalog, RFC requirements, and a repo map. Start at README.md in that folder (also linked from tool entry files).

Open agent entry file

Monthly dataset rollout

Step-by-step checklist: jobpool-listings-r2 DVC pointers, public R2 artifacts, redirect-index prefixes, jobdatapool.com env, and sibling apps (mewannajob, livejobpool).

Agent entry file

Tool entry files

AGENTS.md, CLAUDE.md, and .cursor/rules/jobdatapool.mdc all point agents at the same instruction index.

Codex Claude

Architecture RFCs

Stable URLs for the reasoning layer

The RFCs are still the durable record. They anchor the Job Pool data model, web topology, domain responsibilities, and implementation requirements while this docs hub gives developers the quick path in.

JPE-RFC-0001: Open Job Data Pool

Status: Draft / Proposed

The foundational proposal for treating job listings as shared, structured, continuously refreshed data infrastructure.

Read RFC-0001 Open job schema

JPE-RFC-0002: Job Pool Web Topology

Status: Draft / Proposed

The topology record for domain inventory, user journeys, cross-domain linking, and implementation requirements across the ecosystem.

Read RFC-0002 View homepage topology