Skip to content

Folder and Filename Conventions TL;DR

“Unify folder rules for partitions, but file names must differ based on ‘immutable snapshots vs contract objects’.”

  • Folder structure rules must be fully unified using the Research method
  • File naming must not be unified based solely on dates
<root>/
└── country=<ISO2>/
└── category=<logical_category>/
└── content=<news|faq|policy|guide>/
└── date=YYYY-MM-DD/
  • GitHub / R2 / Future Data Lake (Uniform)
  • Mandatory key=value partitioning
Data TypeFile Naming RuleStorage
Research DatasetYYYY-MM-DD.jsonGitHub
Seed Contractv1.json, v2.jsonGitHub
robots.txtdate=YYYY-MM-DD/robots.txtR2
sitemap.xmldate=YYYY-MM-DD/sitemap.xmlR2
homepage.htmldate=YYYY-MM-DD/homepage.htmlR2
Article HTML❌ Storage Prohibited
::<registered_domain>
``` EN______EN- Example: gov:sg:ica.gov.sg , ngo:intl:who.int
<domain_id>::<content_type>
  • Example: gov:sg:ica.gov.sg::news
  • content_type: news, press_release, faq, guide, policy
  • IDs must be short, meaningful, and immutable
  • Version / Date / URL must be separated as attributes
  • ID ↔ Folder structure must always be reversible