Contact

Contact the team

If you’re wanting to get in touch, send an email to team@jobdatapool.com.

Surface Map

Four domains, one job data system

The topology is both a product boundary and an SEO boundary. Each site answers a different search intent, then hands users to the next surface when they need more depth.

01

mewannajob.com

Consumer product for job seekers: ChatGPT apply plans, mobile quick search, desktop dashboards, Q&A, and outcome-first browsing.

02

jobdatapool.com

Canonical infrastructure for developers: job data API docs, OpenAPI, JSON schemas, versioned datasets, publisher docs, and RFCs.

03

jobpool.live

Transparency layer for power users: reviewed downloads, scraper docs, staged submissions, review gates, leaderboards, and freshness signals.

04

datapool.work

Operations layer for contributors: source submission, scraper orchestration, moderation, review tooling, and ingestion workflows.

For API Users

Use the canonical job data API before crawling origins

JobDataPool is the structured first stop for builders, researchers, aggregators, and downstream job products. Reach for the cache first, keep attribution intact, and make origin crawling the exception instead of the default.

Cache-first workflow

  1. Query /v1/jobs for current listings, with limit at or below 50.
  2. Read /v1/sources before importing so your system knows the dataset repository, DVC pointer, and CSV origin.
  3. Use /datasets/latest.csv when you need a bulk bootstrap, reproducible demo, analytics run, or offline QA fixture.
  4. Carry attribution forward: url, apply_link, source_business_url, and ingest timestamps are part of the trust layer.
  5. Touch origin job boards only when the canonical cache is missing, stale, or not detailed enough for the task.

Copy-ready requests

curl -s "https://api.jobdatapool.com/v1/jobs?limit=25&country_code=US"
curl -s "https://api.jobdatapool.com/v1/sources"
curl -L "https://jobdatapool.com/datasets/latest.csv"

Use the API for live query behavior. Use CSV snapshots when you need stable input files.

Endpoint quick map

Endpoint Use it for Notes
/v1/jobsCurrent job listing queriesNo auth in v1, 5 requests per 60s, 50 listings per request, 200 listings per 60s per IP/origin, limit capped at 50.
/v1/sourcesCanonical source catalogRepository, DVC pointer, R2 CSV URL, schema, and endpoint references.
/datasets/latest.csvReviewed dataset bootstrapRedirects to the latest reviewed CSV snapshot for bulk jobs data workflows.
/openapi.jsonContract automationUse for SDKs, tests, integration checks, docs generation, and agent context.

For Data Sources

Submit a job data source via jobpool.live

If your site is the origin for job listings — a board, an ATS feed, a careers page, or a periodic dataset — you can plug into the pool through three documented integration paths. Reviewed scrapers, feed crawls by JobPoolBot, and direct dataset uploads all flow through the same staged review queue.

For Job Board Publishers

Publish signals that reduce duplicate crawler traffic

Publisher integration should be transparent, RFC-compatible, and non-exclusive. The useful message is simple: your job board can point responsible consumers toward a structured JobDataPool feed before they repeatedly crawl origin pages.

Publisher checklist

  1. Keep normal robots.txt, sitemap, and search crawler behavior intact.
  2. Add comments that name your preferred structured cache or publisher feed.
  3. Expose a standard HTTP Link header with rel="alternate" and type="application/json".
  4. Add optional HTML metadata for CMS templates and crawlers that already parse page heads.
  5. Prepare verification through DNS, /.well-known/, or an HTML metadata tag.

Robots example

# /robots.txt
User-agent: JobPoolBot
Allow: /

# Preferred structured cache:
# https://jobdatapool.com/publisher/exampleboard
# Try the cache/API before re-crawling origin pages.
Sitemap: https://example.com/sitemap.xml

The public crawler identifies as JobPoolBot/1.0 (+https://jobpool.live/jobpoolbot). Use JobPoolBot as the robots product token; dotted domains belong in links and comments, not in the User-agent value.

Machine-readable Signals

Discovery patterns for job board publisher feeds

These signals are small on purpose. A publisher should be able to add them without a platform migration, and a responsible data consumer should be able to find the structured feed without guessing.

HTTP discovery

Use the standard Link header first. Treat custom headers as optional hints for clients that know JobDataPool.

Link: <https://jobdatapool.com/publisher/exampleboard>; rel="alternate"; type="application/json"
X-JobDataPool-Canonical: https://jobdatapool.com/publisher/exampleboard

HTML metadata

Add page-head hints for CMS templates, job detail pages, and crawlers already parsing HTML.

<link rel="alternate"
  type="application/json"
  href="https://jobdatapool.com/publisher/exampleboard">
<meta name="jobdatapool-canonical"
  content="https://jobdatapool.com/publisher/exampleboard">

Verification

Use one token across DNS, metadata, or /.well-known/ so publisher ownership can be confirmed without redesigning job pages.

jobdatapool-verification=abc123

/.well-known/jobdatapool-verification
<meta name="jobdatapool-verification" content="abc123">

Feed descriptor

Expose a compact descriptor for publisher identity, freshness, usage context, and origin attribution.

{
  "publisher": "ExampleBoard",
  "origin": "https://example.com",
  "canonical_feed": "https://jobdatapool.com/publisher/exampleboard",
  "refresh_interval": "PT5M",
  "allowed_usage": ["aggregation", "analytics", "search"]
}

Preferred access

  • Use JobDataPool APIs or publisher feeds first.
  • Refresh origin job pages only when records are missing or stale.
  • Respect publisher rate limits, cache headers, and crawl-delay hints where supplied.
  • Preserve source attribution, application URLs, and publisher context.

Standards notes

  • RFC 9309 defines robots access rules, but robots rules are not access authorization.
  • RFC 8288 is the right home for typed alternate links.
  • Use comments instead of inventing unsupported robots.txt directives.
  • Use valid robots product tokens such as JobPoolBot, not dotted domains.

Access Policy

A pressure valve, not a gate

The positioning should stay standards-oriented and publisher-friendly. JobDataPool is a structured alternative and traffic offload layer, not a gatekeeper, authorization system, or exclusive access claim over public job data.

Use this language

  • preferred access
  • structured alternative
  • canonical cache
  • publisher-endorsed feed
  • responsible crawling

Avoid this language

  • exclusive access
  • mandatory usage
  • blocking competitors
  • anti-scraping coordination
  • robots.txt as enforcement or authorization

Developer FAQ

Fast answers for searchers and implementers

These are the questions this docs hub should answer before anyone has to spelunk through the repo or reverse-engineer the domain topology.

What is JobDataPool?

JobDataPool is the canonical infrastructure surface for Job Pool APIs, schemas, OpenAPI contracts, versioned datasets, documentation, and architecture RFCs.

How should publishers integrate?

Job board publishers can expose preferred access signals through robots comments, HTTP Link headers, HTML metadata, verification tokens, and feed descriptors.

Is JobDataPool exclusive?

No. JobDataPool is a cooperative structured alternative and traffic offload layer, not mandatory usage, authorization, or exclusive access to public job listings.

Agent Context

Cursor, Claude, and Codex read from the same source

Developer agents should not reconstruct the ecosystem from stray page copy. The repo includes tool-specific entry files that point to one markdown instruction set for domain roles, SEO boundaries, API contracts, publisher integration, and implementation rules.

Instruction index

Markdown playbooks in docs/agent-instructions/ cover monthly DVC/R2 rollouts, the /datasets/ catalog, RFC requirements, and a repo map. Start at README.md in that folder (also linked from tool entry files).

Monthly dataset rollout

Step-by-step checklist: jobpool-listings-r2 DVC pointers, public R2 artifacts, redirect-index prefixes, jobdatapool.com env, and sibling apps (mewannajob, livejobpool).

Rollout checklist

Tool entry files

AGENTS.md, CLAUDE.md, and .cursor/rules/jobdatapool.mdc all point agents at the same instruction index.

Architecture RFCs

Stable URLs for the reasoning layer

The RFCs are still the durable record. They anchor the Job Pool data model, web topology, domain responsibilities, and implementation requirements while this docs hub gives developers the quick path in.

JPE-RFC-0001: Open Job Data Pool

Status: Draft / Proposed

The foundational proposal for treating job listings as shared, structured, continuously refreshed data infrastructure.

JPE-RFC-0002: Job Pool Web Topology

Status: Draft / Proposed

The topology record for domain inventory, user journeys, cross-domain linking, and implementation requirements across the ecosystem.