Skip to content

DRAFT: v1 Spec#805

Draft
jonstacks wants to merge 2 commits into
mainfrom
v1-spec
Draft

DRAFT: v1 Spec#805
jonstacks wants to merge 2 commits into
mainfrom
v1-spec

Conversation

@jonstacks
Copy link
Copy Markdown
Collaborator

What

We are working toward a v1 of the ngrok-operator. This PR introduces a formal specification (/specs) that defines the operator's public-facing surface area as we stabilize it for a 1.0 release.

The spec covers:

CRDs (specs/crds/)

Stable API contracts for each Custom Resource:

  • AgentEndpoint, CloudEndpoint, BoundEndpoint — endpoint types and their schemas
  • TrafficPolicy, Domain, IPPolicy, KubernetesOperator — supporting resources
  • Shared patterns: conditions, finalizers, and common types

Annotations (specs/annotations.md)

A central reference for all ngrok.com/ annotations — what they do, which resources they apply to, and how they interact (e.g. pooling-enabled, mapping strategy).

Helm Chart (specs/helm/)

Desired state for values.yaml as a first-class API:

  • Global config, credentials, and the config file system
  • Per-component values: apiManager, agent, bindingsForwarder
  • Feature flags, drain/domain policies, and the cleanup hook

Features (specs/features/)

Behavior specifications for cross-cutting capabilities:

  • Draining — graceful resource cleanup on uninstall
  • Bindings — projecting external ngrok endpoints into the cluster as Kubernetes Services
  • Ingress & Gateway API — how those controllers work and how they map to ngrok endpoints
  • High Availability — replica counts, leader election, PDBs
  • Traffic Policy — resolution and precedence across controllers
  • Multi-install — running multiple operator instances in the same cluster
  • Namespace watching — scoping the operator to specific namespaces

Controllers (specs/controllers/)

Reconciliation behavior for each controller, including error handling, drain awareness, and how individual controllers interact with the Driver pattern (Ingress/Gateway API).

RBAC (specs/rbac/)

Permission sets for each component (operator, agent, bindings forwarder) and aggregation roles.

Authentication (specs/authentication.md)

Credentials, secrets management, API key vs. authtoken handling.

Design Decisions (specs/design-decisions.md)

Settled architectural trade-offs — notably:

  • Following Kubernetes API conventions throughout
  • Deferring field validation to the ngrok API rather than duplicating it in admission webhooks

Why

The operator's API has grown organically. Before cutting a v1, we need a stable, documented contract so that:

  1. We can distinguish bugs from unspecified behavior
  2. Breaking changes are intentional and tracked
  3. Contributors have an authoritative reference before opening PRs

The spec directory acts as a living document — expect iterations, comments, and open questions as we work toward sign-off.

How

This is a review-first PR. The spec was generated from the current operator state and then hand-edited to reflect the desired v1 behavior. Feedback via PR comments is the primary workflow — treat each spec file as its own review surface.

No code changes are included in this PR.

Breaking Changes

None — this PR contains specifications only. Breaking changes may be introduced in subsequent implementation PRs that align the code to the spec for v1.

Signed-off-by: Jonathan Stacks <jonstacks@users.noreply.github.com>
Signed-off-by: Jonathan Stacks <jonstacks@users.noreply.github.com>
@github-actions github-actions Bot added the size/XXL Denotes a PR that changes 1000+ lines label Apr 16, 2026
@jonstacks jonstacks self-assigned this Apr 16, 2026
@jonstacks jonstacks added the documentation Improvements or additions to documentation label Apr 16, 2026

| Field | Description |
|--------------------------|------------------------------------------|
| `assignedURL` | The URL assigned by ngrok |
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These descriptions are a bit short. Should we include more details. For example, if setting url: https://, the assignedURL will be a fully expanded URL based on our rules. Worth noting here?


## Overview

The endpoint bindings feature allows ngrok endpoints to be "bound" into a Kubernetes cluster, projecting external ngrok endpoints as local Kubernetes services. This enables traffic from ngrok to flow directly to services inside the cluster.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit misleading. This is for projecting endpoints that exist in ngrok into you cluster. For example, you could have an endpoint https://service-a.namespace-a that will appear as a k8s service named service-a in namespace-a.

The benefit to this is that pods in your cluster talk directly to a kubernetes service and that traffic is then securely tunneled to ngrok without needing to expose the target service on the internet.


The endpoint bindings feature allows ngrok endpoints to be "bound" into a Kubernetes cluster, projecting external ngrok endpoints as local Kubernetes services. This enables traffic from ngrok to flow directly to services inside the cluster.

**Status:** In development (`features.bindings.enabled: false` by default).
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is no longer in development for v1. We should also look if anything else is marked as in development across the rest of the specs.

Open Question: For v1, do we want this to be on by default or keep it as opt-in still? We're leaning towards on by default.

Comment on lines +23 to +25
| BoundEndpoint CRD | Represents an endpoint bound from ngrok to a service |
| BoundEndpoint controller | Creates target and upstream services for each binding |
| Bindings Forwarder | Bridges connections via mTLS to the ngrok ingress |
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to link to other parts of the spec here.

## Flow

1. The operator registers with the ngrok API as a Kubernetes operator with bindings enabled.
2. The ngrok API pushes bound endpoint information to the operator via a poller.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"pushes" & "poller" are a bit in conflict here.

Comment thread specs/README.md
| Term | Definition | Details |
|------|-----------|---------|
| **Agent Endpoint** | An endpoint backed by an ngrok agent tunnel running inside the cluster. Created as an `AgentEndpoint` CR. | [crds/agentendpoint.md](crds/agentendpoint.md) |
| **Bindings** | A feature that projects external ngrok endpoints into the cluster as local Kubernetes Services, enabling ngrok-to-cluster traffic. | [features/bindings.md](features/bindings.md) |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| **Bindings** | A feature that projects external ngrok endpoints into the cluster as local Kubernetes Services, enabling ngrok-to-cluster traffic. | [features/bindings.md](features/bindings.md) |
| **Bindings** | A feature that projects private ngrok endpoints into the cluster as local Kubernetes Services, enabling pods to send requests to them without those endpoints being on the public internet. | [features/bindings.md](features/bindings.md) |

Comment thread specs/README.md
| Term | Definition | Details |
|------|-----------|---------|
| **Agent Endpoint** | An endpoint backed by an ngrok agent tunnel running inside the cluster. Created as an `AgentEndpoint` CR. | [crds/agentendpoint.md](crds/agentendpoint.md) |
| **Bindings** | A feature that projects external ngrok endpoints into the cluster as local Kubernetes Services, enabling ngrok-to-cluster traffic. | [features/bindings.md](features/bindings.md) |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other items in this table say how they are created via specific CRs or annotations. It might be worth noting how here as well. Something like "Both Agent Endpoints and Cloud Endpoints can be bound to kubernetes with a bindings=kubernetes value and can be created via CRs, the api, dashboard, etc with the same k8s cluster or other cluster, etc" probably more concisely though :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they also all link to more detailed docs so maybe the docs cover it fine

Comment thread specs/annotations.md
Comment on lines +21 to +34
### `ngrok.com/mapping-strategy`

Controls which ngrok endpoint resources are created for a given resource.

| Detail | Value |
|-----------------|--------------------------------------------------------|
| Applies to | `Service` (LoadBalancer), `Ingress`, `Gateway` routes |
| Allowed values | `endpoints`, `endpoints-verbose` |
| Default | `endpoints` |

- `endpoints`: Creates only an `AgentEndpoint`.
- `endpoints-verbose`: Creates both a `CloudEndpoint` and an internal `AgentEndpoint` (URL ending in `.internal`).

See: [controllers/service.md](controllers/service.md), [controllers/ingress.md](controllers/ingress.md)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping strategy might warrant its own spec/doc. While the verbose one is clear what it does, not supplying one and letting it be dynamic is a bit confusing. And the underlying reason for it existing is somethign we've talked about but would be good to write down

Comment thread specs/README.md
Comment on lines +40 to +57
### [helm/](helm/) — Helm Chart Configuration

- [common.md](helm/common.md) — Top-level structure, global defaults, ngrok config, credentials, config file system
- [operator.md](helm/operator.md) — API manager (`apiManager`) deployment values
- [agent.md](helm/agent.md) — Agent deployment values
- [bindings-forwarder.md](helm/bindings-forwarder.md) — Bindings forwarder (`bindingsForwarder`) deployment values
- [features.md](helm/features.md) — Feature flags, feature config, drain/domain policies, cleanup hook

### [features/](features/) — Cross-Cutting Features

- [draining.md](features/draining.md) — Drain policy, cleanup hook, uninstall behavior
- [multi-install.md](features/multi-install.md) — Multiple operator installations in one cluster
- [ingress.md](features/ingress.md) — Kubernetes Ingress feature
- [gateway-api.md](features/gateway-api.md) — Kubernetes Gateway API feature
- [bindings.md](features/bindings.md) — Endpoint bindings feature
- [high-availability.md](features/high-availability.md) — Replicas, leader election, PDB
- [traffic-policy.md](features/traffic-policy.md) — Traffic policy resolution across controllers
- [namespace-watching.md](features/namespace-watching.md) — Namespace scoping configuration
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good for now, and this is just a hunch/though, but I'm thinking long term we'll want more of the cross-cutting specs and less "helm" specs. Like most of these features have some impact across the stack from the helm values to install, how the crd is used, and how the reconciler works and deals with different scenarios. It feels like those should be documented mostly in here.

It probably still makes sense to have some helm specs, but I think it might not get into every helm value for the agent-manager for example, but instead document how we want something like "Global helm values that can be overriden per component. and separatation of k8s configs from controller configs" and might use some examples

Comment thread specs/README.md
Comment on lines +62 to +68
- [agentendpoint.md](crds/agentendpoint.md) — AgentEndpoint (`ngrok.com`)
- [cloudendpoint.md](crds/cloudendpoint.md) — CloudEndpoint (`ngrok.com`)
- [kubernetesoperator.md](crds/kubernetesoperator.md) — KubernetesOperator (`ngrok.com`)
- [trafficpolicy.md](crds/trafficpolicy.md) — TrafficPolicy (`ngrok.com`)
- [domain.md](crds/domain.md) — Domain (`ngrok.com`)
- [ippolicy.md](crds/ippolicy.md) — IPPolicy (`ngrok.com`)
- [boundendpoint.md](crds/boundendpoint.md) — BoundEndpoint (`ngrok.com`)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: i was confused what all the ngrok.com stuff was for. I'm guessing this is supposed to be the api group and is shown here as a relic of the fact that the existing system is not consistent? might be worth removing if they are consistent in this new world

The Ingress controller uses a Driver that collects state from multiple Ingress resources and produces the desired set of ngrok endpoints:

1. Each Ingress is stored in the driver's internal state.
2. `Driver.Sync()` considers all stored Ingresses, Services, and Domains to generate the correct set of `AgentEndpoint` and/or `CloudEndpoint` resources.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we should have a whole document on the driver and store?

## Special Cases

- **Internal domains**: Domains with URLs ending in `.internal` are not managed in the ngrok API. The controller removes the finalizer and takes no further action.
- **DNS provisioning**: Custom domains may take time for DNS to propagate. The exponential backoff rate limiter handles retries during this period.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might also be worth stating that if dns is never setup for a custom domain, it will requeue forever and show not ready

@@ -0,0 +1,56 @@
# Domain Controller

## Executive Summary
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably talk about the reclaim policy in here


## Base Controller Pattern

All ngrok-operator controllers use `BaseController[T]`, a generic base type that implements the standard reconciliation pattern.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most


1. Ensure the associated Domain exists via `DomainManager.EnsureDomainExists()`.
2. Fetch the traffic policy (inline or by name).
3. Create or update the cloud endpoint via the ngrok API.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be specific that we do this regardless of if the domain is ready since a not ready domain because the cert is bad is still usable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size/XXL Denotes a PR that changes 1000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants