Skip to content

Latest commit

 

History

History
42 lines (33 loc) · 870 Bytes

File metadata and controls

42 lines (33 loc) · 870 Bytes

Data Contracts

Canonical Raw Contract

Location: data-raw/sources/*.ndjson

Rules:

  • One file per dataset
  • NDJSON is canonical source of truth
  • Other file formats are import-only inputs

Raw Manifest Contract

Location: data-raw/raw_manifest.json

Per-dataset fields:

  • name
  • rows
  • columns
  • dtypes
  • path
  • sha256
  • source_files
  • ingested_at_utc

Snapshot Contract

Location: data/snapshots/*.parquet

Rules:

  • Derived from canonical NDJSON only
  • Produced by build_snapshots_from_raw.py / pipeline runner

Snapshot Manifest Contract

Location: data/manifests/snapshot_manifest.json

Per-dataset fields:

  • name
  • version
  • path
  • sha256

Compatibility Policy

  • Additive column changes allowed only with schema docs and test updates.
  • Breaking schema changes require migration notes in docs/63_migration_guide.md.