Skip to content

EPIC: Train the P1/P2 clinical model families (doctype/section, relation, concept-linking) #171

@maziyarpanahi

Description

@maziyarpanahi

Summary

section 6.1 names three Build-new model families beyond the P0 PII models: P1 OpenMed-DocType/Section- (route by note type, segment sections), P2 OpenMed-RelEx-Med/ADE- (joint medication-attribute + ADE-drug relation extraction on public DrugProt), and P2 OpenMed-Link-- (two-stage span->concept linking over freely-redistributable vocabs, UMLS/SNOMED behind a user key). All are net-new training programs consuming the shared recipe/gates/dataset substrate. Bundled as one epic-of-epics to decompose; each maps onto its v2.0 pipeline counterpart (sections OM-042, relations OM-043, grounding OM-040).

Scope

  • Decompose before starting into per-family training programs (DocType/Section, RelEx-Med/ADE, Link-), each: define label space/head, assemble public/synthetic data (DUA eval-only), train via recipe mode A/B/C, certify against section 6.4 gates and tier fit.
  • DocType/Section feeds clinical/sections.py (OM-042); RelEx trains on public DrugProt with n2c2/MADE eval-only and feeds clinical/relations.py (OM-043); Link uses freely-redistributable vocabs with MedMentions as the public linking eval and feeds clinical/grounding.py (OM-040); UMLS/SNOMED gated behind a user key, never bundled (CI assertion holds).

Acceptance criteria

  • Decomposition issues created before any training run.
  • Published OpenMed-DocType/Section, RelEx-Med/ADE, and Link- checkpoints clear the applicable section 6.4 gates and fit their declared tier (G5/G6), with manifests + model cards generated.
  • RelEx trained on public DrugProt (n2c2/MADE eval-only, never bundled); Link reports MedMentions numbers and no UMLS/SNOMED/CPT content is bundled (CI assertion holds).
  • test suite green: .venv/bin/python -m pytest tests/ -q

Out of scope

  • The clinical/sections.py / relations.py / grounding.py pipeline code (OM-042, OM-043, OM-040).
  • Training the P3 SLM (OM-053).

Files

  • openmed/training/configs/laptop_lora.yaml
  • openmed/training/configs/large_teacher.yaml
  • openmed/eval/datasets/public.py

Task: OM-052 · Milestone: v2.0 · Priority: P1 · Size: XL
Depends on: OM-038a, OM-038c, OM-031b, OM-040, OM-042, OM-043 · Blocks: —
Roadmap: section 6.1
Spec: PLANS/V2/EXECUTION/tasks/OM-052.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1HighepicLarge; decompose into child issues firstroadmap-v2OpenMed V2 roadmap backlog

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions