Skip to content

Queues Domain Distribution and Worker Roles

Queues Domain Distribution and Input/Output Worker Roles

Section titled “Queues Domain Distribution and Input/Output Worker Roles”

Document Version: 1.0
Purpose: Finalized design document outlining how domains extracted by the Seed Queue Consumer are processed at the Edge, detailing the “1 Domain = 1 Worker” distribution structure and the distinction between Input (Producer) and Output (Consumer) Workers.


  • 1 file → 1 SEED_QUEUE message → 1 Seed Consumer execution.
  • The Seed Consumer extracts N domains (e.g., 200) from that file and sends N messages (200) to the DOMAIN_QUEUE.
  • 1 Domain → 1 DOMAIN_QUEUE message → 1 Domain Consumer execution.

Therefore, “200 domains” are already split into 200 Queue messages, and each message is processed by a separate Domain Consumer Worker.
That is, the structure is designed and implemented as “200 domains per file → 200 Workers each process 1 domain”.


StepTriggerQueueRoleMessage Unit
1. OrchestrationHTTP POST /seeds/orchestrateSEED_QUEUE (Producer)R2 raw file list lookup → 1 message per file transmission1 file
2. File ProcessingSEED_QUEUE (Consumer)DOMAIN_QUEUE (Producer)Read·parse·extract domain from 1 raw file → Send 1 message per domain1 file → N domains
3. Domain ProcessingDOMAIN_QUEUE (Consumer)Collect robots.txt·sitemap per domain·Save to R21 domain

3. What the Seed Queue Consumer Actually Does

Section titled “3. What the Seed Queue Consumer Actually Does”

3.1 It is NOT “processing 200 domains per Worker”

Section titled “3.1 It is NOT “processing 200 domains per Worker””

What the Seed Consumer does (1 file = 1 Worker run):

  1. Read 1 raw file from R2 (1 get)
  2. Parse JSON and validate schema (in-memory processing)
  3. Extract domain list (loop, in-memory)
  4. Call DOMAIN_QUEUE.send(...) per domain (N times, e.g., 200 times)
  5. Record success at {file_path}.success (put 1 time)

No external HTTP requests (robots.txt, sitemap, etc.) per domain.
Such tasks are performed only by the Domain Queue Consumer.

  • Seed Consumer: 1 R2 read + JSON parsing + domain extraction + N queue.send calls (e.g., 200 times).
  • Domain Consumer: 2 HTTP requests per domain (robots, sitemap) + R2 write. Already distributed as 1 domain = 1 Worker.

  • Producer (Input Side): The side that sends messages to the queue.
  • Consumer (Output Side): The side that receives and processes messages from the queue. The same Worker script can be both a Producer and Consumer for multiple queues.
ClassificationQueueInput (Producer)Output (Consumer)
File-level taskSEED_QUEUEOrchestrator (HTTP POST /seeds/orchestrate)Seed Consumer
Domain-level taskDOMAIN_QUEUESeed Consumer (DOMAIN_QUEUE.sended N times during file processing)Domain Consumer
  • wrangler configuration: max_batch_size: 1 → One Consumer per file/per domain.

QuestionConclusion
Does the Seed Consumer process all 200 domains in a single Worker?No. The 200 domains become 200 messages in the DOMAIN_QUEUE, and the Domain Consumer processes one domain at a time.
Are the 200 domains distributed across multiple Workers?Yes. They are already distributed as “one domain per Worker” via DOMAIN_QUEUE + max_batch_size: 1 + Consumer concurrency.
Are Input / Output Workers distinguished?Yes. SEED_QUEUE·DOMAIN_QUEUE each have distinct roles: Producer (Input) and Consumer (Output).
  • Seed Consumer: For each file, R2 read + parsing + domain extraction + N calls to DOMAIN_QUEUE.send (not 200 HTTP requests).
  • Domain Consumer: For each domain: HTTP (robots/sitemap) + R2 storage → Already distributed across N Workers.