Skip to content

KV Dedup Policy

Policy Summary The domain hash in KV is not a deletion target but a “Registry.”

  • ❌ Not deleted
  • ❌ TTL not used
  • ✅ Status only changed (active | deprecated | blocked | archived)
  • ✅ Operate like a Registry

① Domain Registry ID (duplicate check basis)

Section titled “① Domain Registry ID (duplicate check basis)”
domain_id = "{authority}:{country}:{registrable_domain}"
domain_hash = sha256(domain_id)
  • KV Key: domain:{domain_hash}
  • KV Value: domain_id, first_seen, status, etc. Permanently maintained.
seed_id = "{domain_id}::{content_type}"
  • Only one domain per Seed. Multiple Seeds allowed (news, faq, guide, etc.).
  • Recurrence of duplicate Seed generation (same domain keeps appearing as a candidate)
  • Inability to prove “when collection started” during legal/operational audits
  • Repeated cost increases due to re-downloading robots/sitemap/homepage
{
"domain_id": "gov:sg:ica.gov.sg",
"registered_at": "2026-01-24T10:21:00Z",
"last_seen_at": "2026-02-01T03:10:00Z",
"status": "active",
"seed_created": true,
"seed_ids": ["gov:sg:ica.gov.sg::news"]
}
SituationAction
Domain Closedstatus = deprecated
Blocked by robots changestatus = blocked
No longer collectedstatus = archived

Keep the ID, change only the action.