Summary
A CSV/JSONL/Parquet dataset-redaction runner is a v1.6 headline (section 8.2) and OpenMed's answer to pyDeid-style batch de-id (section 2.1, processing/batch.py). The separate v2.0 OM-044 epic is the k-anonymity/generalization engine for STRUCTURED identifiers; this task is the simpler, sooner free-text-cell runner: read a tabular/line file, route free-text columns through deidentify(), and write a redacted dataset with an aggregate audit summary. Without it there is no batch de-id path before v2.0.
Scope
Acceptance criteria
Out of scope
- k-anonymity / l-diversity / t-closeness / DP transforms on structured identifier columns (OM-044).
- Column-role classification via DataProfiler (OM-044).
- Warehouse/streaming connectors beyond local files.
Files
- openmed/processing/batch.py
- openmed/cli/redact_dataset.py
- tests/unit/processing/test_redact_dataset.py
Task: OM-055 · Milestone: v1.6 · Priority: P1 · Size: M
Depends on: OM-002, OM-031a · Blocks: —
Roadmap: section 8.2 (v1.6 headline), section 2.1 (pyDeid row)
Spec: PLANS/V2/EXECUTION/tasks/OM-055.md
Summary
A CSV/JSONL/Parquet dataset-redaction runner is a v1.6 headline (section 8.2) and OpenMed's answer to pyDeid-style batch de-id (section 2.1, processing/batch.py). The separate v2.0 OM-044 epic is the k-anonymity/generalization engine for STRUCTURED identifiers; this task is the simpler, sooner free-text-cell runner: read a tabular/line file, route free-text columns through deidentify(), and write a redacted dataset with an aggregate audit summary. Without it there is no batch de-id path before v2.0.
Scope
Acceptance criteria
Out of scope
Files
Task: OM-055 · Milestone: v1.6 · Priority: P1 · Size: M
Depends on: OM-002, OM-031a · Blocks: —
Roadmap: section 8.2 (v1.6 headline), section 2.1 (pyDeid row)
Spec: PLANS/V2/EXECUTION/tasks/OM-055.md