EPIC: Wire the eval gates into the daily-release engine with two-phase rollback

## Summary
section 7.8 requires the harness to BE the gate on the daily path: every daily candidate runs openmed benchmark on golden + public SHIELD in CI, compares to gates/baseline.json, and fails closed (quarantine + open issue/chip, never auto-publish). Rollback is two-phase: candidate staged, gates run against staged artifacts, only a green result flips the manifest pointer to last-green; a regression caught by nightly full-suite or status monitor triggers 'openmed release rollback <family>'. The <10-min/zero-human rollback SLO is measured here. This is the orchestration layer atop the release-gate harness and depends on the manifest, HF publish step, and scheduled CI. Decompose before starting.

## Scope
- [ ] Decompose before starting.
- [ ] Wire release_gates.py into the daily CI job so each candidate is benchmarked on golden + SHIELD and gate-checked against gates/baseline.json, failing closed (quarantine + issue/chip).
- [ ] Implement two-phase staged rollback: stage candidate, gate against staged artifacts, flip the manifest pointer only on green.
- [ ] Extend 'openmed release rollback <family>' (manifest pointer flip + card/leaderboard/status regen) leveraging each artifact's repro_hash; measure and assert the <10-min/zero-human rollback SLO; publish job writes gates/baseline.json on a green release; trigger nightly full-suite + status monitor.
- [ ] Regenerate model cards, benchmark cards, leaderboard, and status page from the manifest on every green result.

## Acceptance criteria
- [ ] Epic decomposed into S/M/L tasks before implementation begins.
- [ ] A failing-gate candidate is quarantined and never published; an issue/chip is opened.
- [ ] A green candidate flips the manifest pointer and regenerates all trust artifacts.
- [ ] 'openmed release rollback <family>' restores the last-green pointer, regenerates cards, and meets the <10-min/zero-human SLO under test.
- [ ] test suite green: .venv/bin/python -m pytest tests/ -q

## Out of scope
- The gate harness scoring logic (OM-031b, orchestrated here).
- DUA-corpus periodic promotion gate (section 3.3).

## Files
- .github/workflows/release-gates.yml

---
Task: OM-047  ·  Milestone: v2.0  ·  Priority: P1  ·  Size: XL
Depends on: OM-031b, OM-032, OM-024  ·  Blocks: —
Roadmap: section 7.8
Spec: PLANS/V2/EXECUTION/tasks/OM-047.md


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: Wire the eval gates into the daily-release engine with two-phase rollback #161

Summary

Scope

Acceptance criteria

Out of scope

Files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

EPIC: Wire the eval gates into the daily-release engine with two-phase rollback #161

Description

Summary

Scope

Acceptance criteria

Out of scope

Files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions