Location: data-raw/sources/*.ndjson
Rules:
- One file per dataset
- NDJSON is canonical source of truth
- Other file formats are import-only inputs
Location: data-raw/raw_manifest.json
Per-dataset fields:
namerowscolumnsdtypespathsha256source_filesingested_at_utc
Location: data/snapshots/*.parquet
Rules:
- Derived from canonical NDJSON only
- Produced by
build_snapshots_from_raw.py/ pipeline runner
Location: data/manifests/snapshot_manifest.json
Per-dataset fields:
nameversionpathsha256
- Additive column changes allowed only with schema docs and test updates.
- Breaking schema changes require migration notes in
docs/63_migration_guide.md.