What agents and CI integrations can rely on across versions of Agents Shipgate.
This document is the contract. If the runtime ever diverges from what's documented here, that's a bug — please file an issue.
These commands and flags are stable across all 0.x.y releases. They will only change in a major version bump (1.0.0):
| Command | Stable flags |
|---|---|
agents-shipgate scan |
-c, --config, --out, --format, --ci-mode, --fail-on, --baseline, --no-plugins, --verbose, --workspace |
agents-shipgate init |
--workspace, --write, --json |
agents-shipgate doctor |
-c, --config, --workspace, --json, --verbose |
agents-shipgate explain |
<check_id>, --no-plugins, --json |
agents-shipgate list-checks |
--json, --no-plugins |
agents-shipgate baseline save |
-c, --config, --out |
agents-shipgate fixture list |
--json |
agents-shipgate fixture run |
<name>, --ci-mode, --out |
agents-shipgate fixture copy |
<name>, --to |
agents-shipgate fixture verify |
<name> |
agents-shipgate self-check |
--json |
| Code | Meaning |
|---|---|
0 |
Pass — advisory mode or strict mode with no fail_on matches |
2 |
Manifest config error (missing/typo/invalid) |
3 |
Input parse error (malformed YAML/JSON, file too large, path traversal blocked) |
4 |
Other Agents Shipgate error |
20 |
Strict-mode gate failure (≥ 1 unsuppressed finding hit fail_on) |
In agents-shipgate-reports/report.json, the following are guaranteed:
report_schema_version— bumps minor on additive changes, major on breakingrelease_decision.{decision, reason, blockers, review_items, evidence_coverage, baseline_delta, fail_policy}(v0.8+)release_decision.fail_policy.{ci_mode, fail_on, new_findings_only, would_fail_ci, exit_code}release_decision.blockers[].{id, fingerprint, check_id, severity, title, baseline_status}andrelease_decision.review_items[].{id, fingerprint, check_id, severity, title, baseline_status}(reference-only — both arrays share the same item shape; full Finding payload is infindings[])summary.{critical_count, high_count, medium_count, low_count, info_count, suppressed_count, status, human_review_recommended}findings[].{id, fingerprint, check_id, severity, category, title, recommendation, suppressed}findings[].tool_name(string or null)findings[].source.{type, ref, location}(when available)baseline.{matched_count, new_count, resolved_count, path}(when--baselineis used)tool_inventory[].{name, source_type, source_ref, risk_tags, auth_scopes, owner, confidence}loaded_plugins[].{name, value, distribution, version, check_id}
These are intentionally different signals, kept apart for backwards compatibility:
| Field | Baseline-aware? | Recommended for release gating? |
|---|---|---|
release_decision.decision |
yes — baseline-matched criticals appear in review_items, not blockers |
yes (v0.8+) |
summary.status |
no — any unsuppressed critical flips status to release_blockers_detected |
preserved for v0.7 callers |
Concretely: a scan with one baseline-matched critical and zero new findings produces summary.status = "release_blockers_detected" AND release_decision.decision = "review_required". Both are correct under their respective contracts. New consumers should read release_decision.decision.
Once a check ID ships in a tagged release (SHIP-POLICY-APPROVAL-MISSING, SHIP-ADK-GUARDRAIL-EVIDENCE-MISSING, etc.), it will not be:
- Renamed
- Removed (only deprecated, with at least one minor-version cycle)
- Repurposed (the conditions under which it fires may narrow but never broaden in a way that breaks existing suppressions)
New check IDs may be added in any minor release. If your CI pins severities by check ID, expect new checks to surface as new findings.
fingerprint = "fp_" + sha256(check_id | tool_name | canonical_evidence)[:16]
Where canonical_evidence:
- Sorts dict keys recursively
- Sorts list items by JSON repr
- Excludes the
default_severityaudit-evidence key (so applyingseverity_overridesdoes not change identity)
Fingerprints are stable across runs on the same input. They are the identity primitive used by suppressions and baselines.
The scanner does not, under any circumstances:
- Execute or import user code (the SDK loaders use
ast.parseonly) - Make HTTP requests
- Connect to MCP servers
- Invoke LLMs
- Send telemetry
Plugins are off by default. AGENTS_SHIPGATE_ENABLE_PLUGINS=1 enables loading; --no-plugins overrides at the CLI level. When loaded, every plugin is enumerated in report.loaded_plugins.
The manifest schema version (version: "0.1") is independent of the CLI version. Manifest schema changes follow their own deprecation cycle. A 0.1-shaped manifest will load correctly across all 0.x.y CLI releases.
Fixture names listed by agents-shipgate fixture list are stable. Names will not be renamed. New fixtures may be added.
The following paths are part of the public agent surface and will not move within 0.x:
prompts/— task-shaped recipes, individual filenames are stable.claude/commands/shipgate.md— Claude Code/shipgateslash commandskills/agents-shipgate/SKILL.md— Claude Code skill. Frontmatternameis fixed atagents-shipgate(deliberately distinct from the/shipgatecommand so the skill cannot preempt it). Trigger phrases indescriptionmay broaden additively but will not narrow.skills/agents-shipgate/prompts/andskills/agents-shipgate/ci-recipes/— bundled supporting files the skill references via relative paths. Filenames listed inSKILL.mdare stable.
The body content of these files may change to reflect new prompts; the entry-point paths will not.
These are not stable — assume they may grow but not shrink:
- Risk-tag taxonomy. New tags may appear (e.g.
infrastructure_change,code_execution). Existing tags' meanings will not change. - Report
frameworks.{name}blocks. New framework summaries (e.g.frameworks.langchain) may appear. - Manifest fields. New optional fields under existing sections.
- Check default severities. May tighten over time. To pin a severity for your repo, use
checks.severity_overrides.
These are explicitly NOT part of the public contract:
- Internal module layout under
src/agents_shipgate/. Importing from non-public modules will break. - Markdown report layout. Section ordering, exact wording, and table format may change. Parse the JSON report instead.
- Risk classifier keyword sets in
core/risk_hints.py. False positives are tuned over time. To pin specific behavior, userisk_overrides.tools.{tool}.{tags,remove_tags}in your manifest. - Default
inittemplate. The starter manifest format may grow new sections. CheckMetadata.evidence_fieldscontent. New keys may be added to a check's evidence dict.
If you need stability guarantees beyond what's listed here, please open an issue describing the use case.
We follow SemVer loosely:
- Patch (
0.5.x): bug fixes only. No new features, no breaking changes. - Minor (
0.x.0): new features (new checks, new input loaders, new flags). Adheres to this contract. - Major (
1.0.0): may break the contract. Will be announced with a migration guide.
The current version is in pyproject.toml. Changelog is in CHANGELOG.md.
If you encounter behavior that contradicts this document — for example, an unsuppressed finding for a deprecated check ID, or a stable JSON field that disappeared — please open an issue with:
- The version of
agents-shipgate(agents-shipgate --version) - The expected behavior per this document
- The observed behavior (output, error message, JSON fragment)
Stability bugs are prioritized.