Skip to content
- Research Dataset: Contains only discovered facts (Evidence). Not a Seed.
- Seed: A contract defining “where · how · under what conditions · how often to collect.”
- meta: country, category, date, generated_at, generator
- stats: total_domains, alive_domains, avg_response_time_ms
- domains: domain_id, source_domain, discovered, content_nature_inference, policy_hints, semantic_profile, relationships
- domain_id: Stable ID (e.g.,
gov:sg:mom.gov.sg). Used as GraphDB/VectorDB node key.
- content_nature_inference: Specify that it is an inference result (nature, confidence, method, evidence).
- policy_hints: robots_allow_crawling, sitemap_present, suspected_license, etc.
- relationships: Edge hints for GraphDB (GOVERNED_BY, affiliation, etc.).
- required: meta, source, fetch, policy, lifecycle
- meta: seed_id, version, country, category, created_at, status
- source: domain_id, source_domain, source_urls (Map: news, faq, guide…), language, trust_tier
- fetch: type (rss | html | api), Options per RSS/HTML/API
- policy: crawl_allowed (must be human-approved), robots_url, sitemap_url, license
- lifecycle: source_dataset, approved_by, approved_at
- domain_id:
{authority}:{country}:{registrable_domain} (e.g., gov:sg:mom.gov.sg)
- seed_id:
{domain_id}::{content_type} (e.g., gov:sg:mom.gov.sg::news)
- version: Only reflects contract changes. Increment integer. Increment by +1 when URL/selector/schedule changes.
- Research: Facts + Inference. Seed: Decisions + Contracts.
- crawl_allowed requires human approval.
- Seeds prohibit embedding. Perform at article stage.