Skip to content

Latest commit

 

History

History
165 lines (110 loc) · 8.47 KB

File metadata and controls

165 lines (110 loc) · 8.47 KB

Stability Contract · 0.x

What agents and CI integrations can rely on across versions of Agents Shipgate.

This document is the contract. If the runtime ever diverges from what's documented here, that's a bug — please file an issue.


What WILL NOT change in 0.x

CLI command surface

These commands and flags are stable across all 0.x.y releases. They will only change in a major version bump (1.0.0):

Command Stable flags
agents-shipgate scan -c, --config, --out, --format, --ci-mode, --fail-on, --baseline, --no-plugins, --verbose, --workspace
agents-shipgate init --workspace, --write, --json
agents-shipgate doctor -c, --config, --workspace, --json, --verbose
agents-shipgate explain <check_id>, --no-plugins, --json
agents-shipgate list-checks --json, --no-plugins
agents-shipgate baseline save -c, --config, --out
agents-shipgate fixture list --json
agents-shipgate fixture run <name>, --ci-mode, --out
agents-shipgate fixture copy <name>, --to
agents-shipgate fixture verify <name>
agents-shipgate self-check --json

Exit codes

Code Meaning
0 Pass — advisory mode or strict mode with no fail_on matches
2 Manifest config error (missing/typo/invalid)
3 Input parse error (malformed YAML/JSON, file too large, path traversal blocked)
4 Other Agents Shipgate error
20 Strict-mode gate failure (≥ 1 unsuppressed finding hit fail_on)

JSON report fields (stable)

In agents-shipgate-reports/report.json, the following are guaranteed:

  • report_schema_version — bumps minor on additive changes, major on breaking
  • release_decision.{decision, reason, blockers, review_items, evidence_coverage, baseline_delta, fail_policy} (v0.8+)
  • release_decision.fail_policy.{ci_mode, fail_on, new_findings_only, would_fail_ci, exit_code}
  • release_decision.blockers[].{id, fingerprint, check_id, severity, title, baseline_status} and release_decision.review_items[].{id, fingerprint, check_id, severity, title, baseline_status} (reference-only — both arrays share the same item shape; full Finding payload is in findings[])
  • summary.{critical_count, high_count, medium_count, low_count, info_count, suppressed_count, status, human_review_recommended}
  • findings[].{id, fingerprint, check_id, severity, category, title, recommendation, suppressed}
  • findings[].tool_name (string or null)
  • findings[].source.{type, ref, location} (when available)
  • baseline.{matched_count, new_count, resolved_count, path} (when --baseline is used)
  • tool_inventory[].{name, source_type, source_ref, risk_tags, auth_scopes, owner, confidence}
  • loaded_plugins[].{name, value, distribution, version, check_id}

release_decision.decision vs summary.status

These are intentionally different signals, kept apart for backwards compatibility:

Field Baseline-aware? Recommended for release gating?
release_decision.decision yes — baseline-matched criticals appear in review_items, not blockers yes (v0.8+)
summary.status no — any unsuppressed critical flips status to release_blockers_detected preserved for v0.7 callers

Concretely: a scan with one baseline-matched critical and zero new findings produces summary.status = "release_blockers_detected" AND release_decision.decision = "review_required". Both are correct under their respective contracts. New consumers should read release_decision.decision.

Check IDs

Once a check ID ships in a tagged release (SHIP-POLICY-APPROVAL-MISSING, SHIP-ADK-GUARDRAIL-EVIDENCE-MISSING, etc.), it will not be:

  • Renamed
  • Removed (only deprecated, with at least one minor-version cycle)
  • Repurposed (the conditions under which it fires may narrow but never broaden in a way that breaks existing suppressions)

New check IDs may be added in any minor release. If your CI pins severities by check ID, expect new checks to surface as new findings.

Fingerprint algorithm

fingerprint = "fp_" + sha256(check_id | tool_name | canonical_evidence)[:16]

Where canonical_evidence:

  • Sorts dict keys recursively
  • Sorts list items by JSON repr
  • Excludes the default_severity audit-evidence key (so applying severity_overrides does not change identity)

Fingerprints are stable across runs on the same input. They are the identity primitive used by suppressions and baselines.

Trust-model invariants

The scanner does not, under any circumstances:

  • Execute or import user code (the SDK loaders use ast.parse only)
  • Make HTTP requests
  • Connect to MCP servers
  • Invoke LLMs
  • Send telemetry

Plugins are off by default. AGENTS_SHIPGATE_ENABLE_PLUGINS=1 enables loading; --no-plugins overrides at the CLI level. When loaded, every plugin is enumerated in report.loaded_plugins.

Manifest schema

The manifest schema version (version: "0.1") is independent of the CLI version. Manifest schema changes follow their own deprecation cycle. A 0.1-shaped manifest will load correctly across all 0.x.y CLI releases.

Fixture names

Fixture names listed by agents-shipgate fixture list are stable. Names will not be renamed. New fixtures may be added.

Agent-skill paths

The following paths are part of the public agent surface and will not move within 0.x:

The body content of these files may change to reflect new prompts; the entry-point paths will not.


What MAY change additively in any minor release

These are not stable — assume they may grow but not shrink:

  • Risk-tag taxonomy. New tags may appear (e.g. infrastructure_change, code_execution). Existing tags' meanings will not change.
  • Report frameworks.{name} blocks. New framework summaries (e.g. frameworks.langchain) may appear.
  • Manifest fields. New optional fields under existing sections.
  • Check default severities. May tighten over time. To pin a severity for your repo, use checks.severity_overrides.

What MAY change in any minor release

These are explicitly NOT part of the public contract:

  • Internal module layout under src/agents_shipgate/. Importing from non-public modules will break.
  • Markdown report layout. Section ordering, exact wording, and table format may change. Parse the JSON report instead.
  • Risk classifier keyword sets in core/risk_hints.py. False positives are tuned over time. To pin specific behavior, use risk_overrides.tools.{tool}.{tags,remove_tags} in your manifest.
  • Default init template. The starter manifest format may grow new sections.
  • CheckMetadata.evidence_fields content. New keys may be added to a check's evidence dict.

If you need stability guarantees beyond what's listed here, please open an issue describing the use case.


Versioning

We follow SemVer loosely:

  • Patch (0.5.x): bug fixes only. No new features, no breaking changes.
  • Minor (0.x.0): new features (new checks, new input loaders, new flags). Adheres to this contract.
  • Major (1.0.0): may break the contract. Will be announced with a migration guide.

The current version is in pyproject.toml. Changelog is in CHANGELOG.md.


Reporting a contract violation

If you encounter behavior that contradicts this document — for example, an unsuppressed finding for a deprecated check ID, or a stable JSON field that disappeared — please open an issue with:

  1. The version of agents-shipgate (agents-shipgate --version)
  2. The expected behavior per this document
  3. The observed behavior (output, error message, JSON fragment)

Stability bugs are prioritized.