|
| 1 | +// SPDX-License-Identifier: MPL-2.0 |
| 2 | +// SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk> |
| 3 | + |
| 4 | +# CICD Optimization Roadmap — Ultra-Zotta-Plan |
| 5 | + |
| 6 | +**Date:** 2026-06-05 |
| 7 | +**Status:** DRAFT |
| 8 | +**Owner:** estate-wide |
| 9 | +**Priority:** CRITICAL (cost + velocity blocker) |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## Executive Summary |
| 14 | + |
| 15 | +The estate currently runs **~281 secret-scanner deployments** + **redundant build workflows** across 1,191+ repos. Analysis of `verisimdb#114` reveals **duplicate job names** (`bench-compile`, `audit`) across `rust-ci.yml`, `elixir-ci.yml`, and `build-validation.yml`, causing CI queuing delays and unnecessary minute consumption. |
| 16 | + |
| 17 | +**Estimated waste:** ~30-40% of CI minutes from redundancy + path-filter misses. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## Track 1: Immediate Redundancy Elimination (Week 1) |
| 22 | + |
| 23 | +### 1.1 Remove trufflehog from secret-scanner ✅ **DONE** |
| 24 | +- **File:** `standards/.github/workflows/secret-scanner-reusable.yml` |
| 25 | +- **Change:** Deleted `trufflehog:` job (13s saved per run × 1,191 repos) |
| 26 | +- **Impact:** ~5,000+ minutes/month saved |
| 27 | + |
| 28 | +### 1.2 Consolidate build workflows in verisimdb |
| 29 | +- **Problem:** `build-validation.yml` + `rust-ci.yml` + `elixir-ci.yml` = duplicate `bench-compile` + `audit` jobs |
| 30 | +- **Action:** |
| 31 | + - Merge `build-validation.yml` into `rust-ci.yml` and `elixir-ci.yml` with path filters |
| 32 | + - OR: Delete `build-validation.yml`, rely on full CI workflows |
| 33 | + - **Estimated savings:** ~2-3 minutes per PR |
| 34 | + |
| 35 | +### 1.3 Rust-secrets waste elimination |
| 36 | +- **Problem:** 300+ repos run `rust-secrets` but have NO `Cargo.toml` (aspasia, bgp-backbone-lab, branch-newspaper, etc.) |
| 37 | +- **Action:** Add path filter to rust-secrets job: |
| 38 | + ```yaml |
| 39 | + rust-secrets: |
| 40 | + if: contains(github.event.pull_request.changed_files, 'Cargo.toml') || contains(github.event.pull_request.changed_files, '**.rs') |
| 41 | + # or: if: hashFiles('Cargo.toml') != '' |
| 42 | + ``` |
| 43 | +- **Estimated savings:** ~300 repos × 3s = 900s per estate-wide push |
| 44 | + |
| 45 | +### 1.4 Estate-wide workflow audit |
| 46 | +| Workflow | Purpose | Redundant? | Action | |
| 47 | +|----------|---------|------------|--------| |
| 48 | +| `build-validation.yml` | Quick build check | YES (rust-ci + elixir-ci cover it) | DELETE or merge | |
| 49 | +| `rust-ci.yml` | Full Rust CI | NO | Keep, add path filters | |
| 50 | +| `elixir-ci.yml` | Full Elixir CI | NO | Keep, add path filters | |
| 51 | +| `secret-scanner.yml` | Secret detection | NO (but had trufflehog+gitleaks overlap) | ✅ Fixed | |
| 52 | +| `codeql.yml` | Security analysis | NO | Keep, schedule weekly | |
| 53 | +| `scorecard.yml` | Supply chain | NO | Keep, schedule weekly | |
| 54 | +| `hypatia-scan.yml` | Neurosymbolic | NO | Keep, schedule weekly | |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## Track 2: Workflow Naming + Clarity (Week 1-2) |
| 59 | + |
| 60 | +### 2.1 Standardize naming convention |
| 61 | +**Current chaos:** |
| 62 | +- `rust-ci.yml` vs `elixir-ci.yml` vs `build-validation.yml` (inconsistent) |
| 63 | +- `secret-scanner.yml` vs `security-scan.yml` (overlap) |
| 64 | +- `scorecard.yml` vs `scorecard-enforcer.yml` (unclear difference) |
| 65 | + |
| 66 | +**Proposed standard:** |
| 67 | +``` |
| 68 | +<language>-<purpose>.yml |
| 69 | + OR |
| 70 | +<purpose>-<language>.yml |
| 71 | +
|
| 72 | +Examples: |
| 73 | +- rust-build-test.yml |
| 74 | +- elixir-build-test.yml |
| 75 | +- security-secret-scan.yml |
| 76 | +- security-codeql.yml |
| 77 | +- security-scorecard.yml |
| 78 | +- governance-license.yml |
| 79 | +- governance-workflow-linter.yml |
| 80 | +``` |
| 81 | + |
| 82 | +### 2.2 Add descriptive metadata |
| 83 | +Every workflow should have: |
| 84 | +```yaml |
| 85 | +# SPDX-License-Identifier: MPL-2.0 |
| 86 | +name: Rust — Build + Test + Lint |
| 87 | +# Purpose: Validates Rust code compiles, passes tests, clippy lint |
| 88 | +# Owner: @hyperpolymath |
| 89 | +# Schedule: On push/PR (path-filtered to Rust files) |
| 90 | +# Timeout: 30m total |
| 91 | +# Est. Cost: 5-8 minutes per run |
| 92 | +``` |
| 93 | + |
| 94 | +--- |
| 95 | + |
| 96 | +## Track 3: Path Filter Optimization (Week 2) |
| 97 | + |
| 98 | +### 3.1 Language-specific filtering |
| 99 | +```yaml |
| 100 | +# rust-ci.yml |
| 101 | +on: |
| 102 | + push: |
| 103 | + paths: |
| 104 | + - '**.rs' |
| 105 | + - 'Cargo.toml' |
| 106 | + - 'Cargo.lock' |
| 107 | + - 'rust-toolchain' |
| 108 | + - '.github/workflows/rust-ci.yml' |
| 109 | + pull_request: |
| 110 | + paths: |
| 111 | + - '**.rs' |
| 112 | + - 'Cargo.toml' |
| 113 | + - 'Cargo.lock' |
| 114 | +``` |
| 115 | +
|
| 116 | +### 3.2 Docs-only PR fast path |
| 117 | +```yaml |
| 118 | +# For workflows that don't need docs changes: |
| 119 | +if: > |
| 120 | + contains(github.event.pull_request.changed_files, '.md') == false && |
| 121 | + contains(github.event.pull_request.changed_files, '.adoc') == false |
| 122 | +``` |
| 123 | +
|
| 124 | +--- |
| 125 | +
|
| 126 | +## Track 4: Test + Bench Standards (Week 2-3) |
| 127 | +
|
| 128 | +### 4.1 Audit existing test standards |
| 129 | +- **Location:** `standards/.github/workflows/` + `standards/templates/` |
| 130 | +- **Check:** |
| 131 | + - Are test workflows applied estate-wide? |
| 132 | + - Are there contradictions between repos? |
| 133 | + - Are panic-attack tests integrated? |
| 134 | + - Are benchmarks proven/safe? |
| 135 | + |
| 136 | +### 4.2 Panic-Attack integration |
| 137 | +- **Current:** `panic-attack` workflow exists but may not be in all repos |
| 138 | +- **Action:** Add to all repos that have Rust code |
| 139 | +- **Patterns to detect:** |
| 140 | + - Unsafe blocks |
| 141 | + - Unwraps/expects |
| 142 | + - Integer overflows |
| 143 | + - Race conditions |
| 144 | + |
| 145 | +### 4.3 Proven Tests Repo (Idris2) |
| 146 | +**New repo:** `proven-tests-and-benches` |
| 147 | + |
| 148 | +**Purpose:** Formal guarantees for test correctness using Idris2 |
| 149 | + |
| 150 | +**Coverage targets:** |
| 151 | +- Echo-type safety tests |
| 152 | +- Identity/projection/invariance traversal |
| 153 | +- Set concepts |
| 154 | +- Interdimensional transfer |
| 155 | +- Higher-order constructs |
| 156 | + |
| 157 | +**Properties to prove:** |
| 158 | +```idris |
| 159 | +-- Tests are valid |
| 160 | +TestValid : (t : Test) -> Type |
| 161 | +TestValid t = ... |
| 162 | +
|
| 163 | +-- Tests are sound (catch what they claim) |
| 164 | +TestSound : (t : Test) -> Prop |
| 165 | +TestSound t = ... |
| 166 | +
|
| 167 | +-- Tests are tamper-proof |
| 168 | +TestTamperProof : (t : Test) -> Prop |
| 169 | +TestTamperProof t = ... |
| 170 | +
|
| 171 | +-- Tests are unpanickable |
| 172 | +TestUnpanickable : (t : Test) -> Prop |
| 173 | +TestUnpanickable t = ... |
| 174 | +``` |
| 175 | + |
| 176 | +--- |
| 177 | + |
| 178 | +## Track 5: Acceleration Opportunities (Week 3-4) |
| 179 | + |
| 180 | +### 5.1 Self-hosted runners for heavy work |
| 181 | +- **Targets:** fuzzing, E2E, Hypatia scans |
| 182 | +- **Estimated savings:** 80-90% for targeted workflows |
| 183 | +- **Security:** Rootless Podman containers |
| 184 | + |
| 185 | +### 5.2 Caching optimization |
| 186 | +- **Current:** Some repos use `Swatinem/rust-cache`, others don't |
| 187 | +- **Action:** Standardize caching across all language workflows |
| 188 | +- **Targets:** |
| 189 | + - Rust: `cargo` cache + `target/` directory |
| 190 | + - Elixir: `deps/` + `_build/` |
| 191 | + - Node: `node_modules/` |
| 192 | + - Go: `go.mod` hash-based |
| 193 | + |
| 194 | +### 5.3 Matrix strategy optimization |
| 195 | +- **Problem:** Many workflows test every combination (Rust nightly/stable/beta × OS) |
| 196 | +- **Action:** Reduce matrix for PRs, full matrix for nightly/main |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +## Track 6: Estate-Wide Workflow Inventory (Week 1) |
| 201 | + |
| 202 | +### 6.1 Generate current inventory |
| 203 | +```bash |
| 204 | +# List all unique workflow files |
| 205 | +find . -path "*/.github/workflows/*.yml" -type f | sort | uniq |
| 206 | +
|
| 207 | +# Count per workflow name |
| 208 | +find . -path "*/.github/workflows/*.yml" -type f | xargs -I {} basename {} | sort | uniq -c | sort -rn |
| 209 | +
|
| 210 | +# Identify duplicates (same name, different content) |
| 211 | +find . -path "*/.github/workflows/*.yml" -type f -exec sha256sum {} \; | awk '{print $1, $2}' | sort | uniq -d -w 64 |
| 212 | +``` |
| 213 | + |
| 214 | +### 6.2 Categorize workflows |
| 215 | +| Category | Workflow | Count | Action | |
| 216 | +|----------|----------|-------|--------| |
| 217 | +| Security | secret-scanner.yml | 1,191 | ✅ Optimized | |
| 218 | +| Security | codeql.yml | ~500 | Schedule weekly | |
| 219 | +| Security | scorecard.yml | ~500 | Schedule weekly | |
| 220 | +| Security | hypatia-scan.yml | ~500 | Schedule weekly | |
| 221 | +| Build | rust-ci.yml | ~200 | Path filter | |
| 222 | +| Build | elixir-ci.yml | ~50 | Path filter | |
| 223 | +| Build | build-validation.yml | ~50 | DELETE | |
| 224 | +| Governance | governance.yml | ~1000 | N/A | |
| 225 | +| Lint | workflow-linter.yml | ~500 | N/A | |
| 226 | +| Mirror | mirror.yml | ~300 | N/A | |
| 227 | + |
| 228 | +--- |
| 229 | + |
| 230 | +## Immediate Action Items (Next 48 Hours) |
| 231 | + |
| 232 | +1. ✅ **DONE:** Remove trufflehog from secret-scanner-reusable.yml |
| 233 | +2. **TODO:** Merge `build-validation.yml` into language-specific CI workflows in verisimdb |
| 234 | +3. **TODO:** Add path filters to rust-secrets job in secret-scanner-reusable.yml |
| 235 | +4. **TODO:** Create `proven-tests-and-benches` repo skeleton with Idris2 proofs |
| 236 | +5. **TODO:** Open per-repo issues for workflow consolidation (verisimdb, then propagate) |
| 237 | + |
| 238 | +--- |
| 239 | + |
| 240 | +## Success Metrics |
| 241 | + |
| 242 | +| Metric | Current | Target (4 weeks) | Target (12 weeks) | |
| 243 | +|--------|---------|------------------|-------------------| |
| 244 | +| CI minutes/month | ~50,000 | <25,000 (-50%) | <15,000 (-70%) | |
| 245 | +| Workflow count | ~1,200 | <800 | <600 | |
| 246 | +| PR merge time | ~30min | <15min | <10min | |
| 247 | +| Test coverage | ~80% | >90% | >95% | |
| 248 | +| Proven tests | 0 | >10 modules | >50 modules | |
| 249 | + |
| 250 | +--- |
| 251 | + |
| 252 | +## Tags |
| 253 | + |
| 254 | +`cicd-optimization`, `cost-reduction`, `performance`, `security`, `ultra-zotta-plan`, `estate-wide`, `high-priority` |
| 255 | + |
| 256 | +--- |
| 257 | + |
| 258 | +## Related Issues |
| 259 | + |
| 260 | +- [ ] Track 1: Immediate Redundancy Elimination |
| 261 | +- [ ] Track 2: Workflow Naming + Clarity |
| 262 | +- [ ] Track 3: Path Filter Optimization |
| 263 | +- [ ] Track 4: Test + Bench Standards |
| 264 | +- [ ] Track 5: Acceleration Opportunities |
| 265 | +- [ ] Track 6: Estate-Wide Workflow Inventory |
0 commit comments