Skip to content

EPIC: Train the P0 tiny direct-identifier PII family #167

@maziyarpanahi

Description

@maziyarpanahi

Summary

section 6.1 P0 family OpenMed-PII-DirectID-: a high-recall, phone-resident-default tiny detector for deterministic identifiers (SSN, MRN, NPI, IBAN, credit card, email, phone, API key, account#) where regex/checksum backstops. Status: detectors exist but there is NO dedicated tiny direct-ID head and NO regex/checksum backstop integration in a trained head. This is a model-training program; decompose before starting via recipe mode A, clearing section 6.4 gates G1b (>=99.5% structured-id recall), G3 (leakage 0), G4 (quant delta), G5 (Tiny-tier fit).

Scope

  • Decompose before starting: head definition, dataset assembly (public + synthetic + hard negatives), recipe-mode-A training run, regex/checksum backstop integration (safety sweep), quantization (INT8 default, INT4 if recall holds), and gate certification.
  • Target the Tiny tier (phone-resident); certify against G1b, G3, G4, G5.

Acceptance criteria

  • Decomposition issues created before any training run.
  • A published OpenMed-PII-DirectID- checkpoint clears G1b, G3, G4, and G5 with a signed gate report.
  • Manifest + signed gate report + model card generated for the checkpoint.
  • test suite green: .venv/bin/python -m pytest tests/ -q

Out of scope

  • The recipe scaffolding, gate harness, hard-negative harness, dataset adapters (OM-038a/b/c, OM-031b, consumed here).

Files

  • openmed/training/configs/tiny_distill.yaml
  • openmed/core/safety_sweep.py

Task: OM-050 · Milestone: v1.7 · Priority: P0 · Size: XL
Depends on: OM-038a, OM-038b, OM-031b, OM-008 · Blocks: —
Roadmap: section 6.1
Spec: PLANS/V2/EXECUTION/tasks/OM-050.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0CriticalepicLarge; decompose into child issues firstroadmap-v2OpenMed V2 roadmap backlog

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions