Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions specs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# ngrok-operator Specifications

This directory contains the v1 specifications for the ngrok-operator. These specs document all public-facing behavior, APIs, and configuration — serving as the authoritative reference for what the operator does and how it behaves.

## Purpose

1. **Implementation planning**: Provides a baseline for planning changes as we work toward v1.
2. **Bug vs. unspecified behavior**: Defines expected behavior so we can distinguish bugs from unspecified behavior.
3. **Living documentation**: Serves as the canonical reference for the operator's public surface area.

## Glossary

| Term | Definition | Details |
|------|-----------|---------|
| **Agent Endpoint** | An endpoint backed by an ngrok agent tunnel running inside the cluster. Created as an `AgentEndpoint` CR. | [crds/agentendpoint.md](crds/agentendpoint.md) |
| **Bindings** | A feature that projects external ngrok endpoints into the cluster as local Kubernetes Services, enabling ngrok-to-cluster traffic. | [features/bindings.md](features/bindings.md) |
Comment thread
jonstacks marked this conversation as resolved.
Outdated
Comment thread
jonstacks marked this conversation as resolved.
Outdated
| **Cloud Endpoint** | An endpoint managed entirely in the ngrok cloud (no local agent tunnel). Created as a `CloudEndpoint` CR. | [crds/cloudendpoint.md](crds/cloudendpoint.md) |
| **Drain / Draining** | The process of gracefully removing ngrok API resources when the operator is uninstalled. Triggered by deletion of the `KubernetesOperator` CR. | [features/draining.md](features/draining.md) |
| **Driver** | An internal pattern used by Ingress and Gateway API controllers that collects state from multiple resources and materializes the combined state as ngrok endpoints. | [controllers/ingress.md](controllers/ingress.md) |
| **Endpoint Pooling** | Allows multiple endpoints to share the same public URL, distributing traffic across them. Controlled via the `ngrok.com/pooling-enabled` annotation. | [annotations.md](annotations.md) |
| **Mapping Strategy** | Controls which ngrok endpoint resources are created: `endpoints` (AgentEndpoint only) or `endpoints-verbose` (CloudEndpoint + internal AgentEndpoint). | [annotations.md](annotations.md) |
| **Traffic Policy** | A set of rules (rate limiting, header manipulation, authentication, etc.) applied to ngrok endpoints. Defined as an `TrafficPolicy` CR or inline JSON. | [features/traffic-policy.md](features/traffic-policy.md) |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix article before TrafficPolicy type name.

Line 22 should use “a TrafficPolicy CR” (not “an TrafficPolicy CR”).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/README.md` at line 22, The article before the type name is incorrect:
change “an `TrafficPolicy` CR” to “a `TrafficPolicy` CR” in the sentence
referencing the TrafficPolicy type; locate the phrase that includes
`TrafficPolicy` and replace the leading article "an" with "a" so it reads "a
`TrafficPolicy` CR".


## Directory Structure
Comment thread
jonstacks marked this conversation as resolved.

### Top-Level Specs

- [authentication.md](authentication.md) — Credentials, secrets, API key/authtoken management
- [annotations.md](annotations.md) — Central reference for all `ngrok.com/` annotations
- [design-decisions.md](design-decisions.md) — Settled architectural trade-offs and their rationale

### [rbac/](rbac/) — RBAC Configuration

- [README.md](rbac/README.md) — RBAC model overview and design principles
- [operator.md](rbac/operator.md) — Operator (api-manager) permissions
- [agent.md](rbac/agent.md) — Agent permissions
- [bindings-forwarder.md](rbac/bindings-forwarder.md) — Bindings forwarder permissions
- [aggregation.md](rbac/aggregation.md) — Editor/viewer aggregation roles

### [helm/](helm/) — Helm Chart Configuration

- [common.md](helm/common.md) — Top-level structure, global defaults, ngrok config, credentials, config file system
- [operator.md](helm/operator.md) — API manager (`apiManager`) deployment values
- [agent.md](helm/agent.md) — Agent deployment values
- [bindings-forwarder.md](helm/bindings-forwarder.md) — Bindings forwarder (`bindingsForwarder`) deployment values
- [features.md](helm/features.md) — Feature flags, feature config, drain/domain policies, cleanup hook

### [features/](features/) — Cross-Cutting Features

- [draining.md](features/draining.md) — Drain policy, cleanup hook, uninstall behavior
- [multi-install.md](features/multi-install.md) — Multiple operator installations in one cluster
- [ingress.md](features/ingress.md) — Kubernetes Ingress feature
- [gateway-api.md](features/gateway-api.md) — Kubernetes Gateway API feature
- [bindings.md](features/bindings.md) — Endpoint bindings feature
- [high-availability.md](features/high-availability.md) — Replicas, leader election, PDB
- [traffic-policy.md](features/traffic-policy.md) — Traffic policy resolution across controllers
- [namespace-watching.md](features/namespace-watching.md) — Namespace scoping configuration
Comment thread
jonstacks marked this conversation as resolved.

### [crds/](crds/) — Custom Resource Definitions

- [common.md](crds/common.md) — Shared patterns: conditions, finalizers, shared types
- [agentendpoint.md](crds/agentendpoint.md) — AgentEndpoint (`ngrok.com`)
- [cloudendpoint.md](crds/cloudendpoint.md) — CloudEndpoint (`ngrok.com`)
- [kubernetesoperator.md](crds/kubernetesoperator.md) — KubernetesOperator (`ngrok.com`)
- [trafficpolicy.md](crds/trafficpolicy.md) — TrafficPolicy (`ngrok.com`)
- [domain.md](crds/domain.md) — Domain (`ngrok.com`)
- [ippolicy.md](crds/ippolicy.md) — IPPolicy (`ngrok.com`)
- [boundendpoint.md](crds/boundendpoint.md) — BoundEndpoint (`ngrok.com`)
Comment on lines +62 to +68

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: i was confused what all the ngrok.com stuff was for. I'm guessing this is supposed to be the api group and is shown here as a relic of the fact that the existing system is not consistent? might be worth removing if they are consistent in this new world

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove the ngrok.com here


### [controllers/](controllers/) — Controller Behavior

- [common.md](controllers/common.md) — Base controller pattern, error handling, drain awareness
- [service.md](controllers/service.md) — Service LoadBalancer controller
- [ingress.md](controllers/ingress.md) — Ingress controller
- [agentendpoint.md](controllers/agentendpoint.md) — AgentEndpoint controller
- [cloudendpoint.md](controllers/cloudendpoint.md) — CloudEndpoint controller
- [kubernetesoperator.md](controllers/kubernetesoperator.md) — KubernetesOperator controller
- [trafficpolicy.md](controllers/trafficpolicy.md) — TrafficPolicy controller
- [domain.md](controllers/domain.md) — Domain controller
- [ippolicy.md](controllers/ippolicy.md) — IPPolicy controller
- [boundendpoint.md](controllers/boundendpoint.md) — BoundEndpoint controller
- [bindings-forwarder.md](controllers/bindings-forwarder.md) — Bindings Forwarder controller
- **[gateway-api/](controllers/gateway-api/)** — Gateway API controllers
- [gatewayclass.md](controllers/gateway-api/gatewayclass.md)
- [gateway.md](controllers/gateway-api/gateway.md)
- [httproute.md](controllers/gateway-api/httproute.md)
- [tcproute.md](controllers/gateway-api/tcproute.md)
- [tlsroute.md](controllers/gateway-api/tlsroute.md)
- [referencegrant.md](controllers/gateway-api/referencegrant.md)
99 changes: 99 additions & 0 deletions specs/annotations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Annotations Reference

## Overview

The ngrok-operator uses annotations under the `ngrok.com/` prefix to configure behavior on Kubernetes resources. This document serves as a central reference. See individual CRD specs in [crds/](crds/) for full details on how each annotation is used.

## User-Configurable Annotations

### `ngrok.com/url`

Specifies the public URL for an endpoint.

| Detail | Value |
|-----------------|--------------------------------------------------------|
| Applies to | `Service` (LoadBalancer) |
| Default | (none — a dynamic TCP address is assigned) |
| Examples | `tcp://1.tcp.ngrok.io:12345`, `tcp://`, `tls://example.com` |

See: [controllers/service.md](controllers/service.md)

### `ngrok.com/mapping-strategy`

Controls which ngrok endpoint resources are created for a given resource.

| Detail | Value |
|-----------------|--------------------------------------------------------|
| Applies to | `Service` (LoadBalancer), `Ingress`, `Gateway` routes |
| Allowed values | `endpoints`, `endpoints-verbose` |
| Default | `endpoints` |

- `endpoints`: Creates only an `AgentEndpoint`.
- `endpoints-verbose`: Creates both a `CloudEndpoint` and an internal `AgentEndpoint` (URL ending in `.internal`).

See: [controllers/service.md](controllers/service.md), [controllers/ingress.md](controllers/ingress.md)
Comment thread
jonstacks marked this conversation as resolved.

### `ngrok.com/traffic-policy`
Comment thread
jonstacks marked this conversation as resolved.

References an `TrafficPolicy` resource in the same namespace to apply to the created endpoint(s).

| Detail | Value |
|-----------------|--------------------------------------------------------|
| Applies to | `Service` (LoadBalancer), `Ingress`, `Gateway` routes |
| Value | Name of an `TrafficPolicy` resource |
Comment on lines +38 to +43

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix TrafficPolicy article usage in this section.

Use “a TrafficPolicy resource” (not “an TrafficPolicy resource”) in both Line 38 and Line 43 for consistency with API type naming.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/annotations.md` around lines 38 - 43, The wording uses "an
`TrafficPolicy` resource" incorrectly; update both occurrences in the section
that describes referencing a TrafficPolicy to read "a `TrafficPolicy` resource"
instead, ensuring consistency with the API type name `TrafficPolicy` wherever
that phrase appears in the paragraph describing Applies to/Value for the created
endpoint(s).

| Default | (none) |

When `mapping-strategy` is `endpoints-verbose`, the traffic policy is applied to the `CloudEndpoint`. When `endpoints`, it is applied to the `AgentEndpoint`.

See: [features/traffic-policy.md](features/traffic-policy.md)

### `ngrok.com/pooling-enabled`

Controls whether the endpoint allows pooling with other endpoints sharing the same URL.

| Detail | Value |
|-----------------|--------------------------------------------------------|
| Applies to | `Service` (LoadBalancer), `Ingress`, `Gateway` routes |
| Allowed values | `"true"`, `"false"` |
| Default | (none — uses ngrok platform default) |

### `ngrok.com/description`

Sets a human-readable description on the ngrok endpoint resource.

| Detail | Value |
|-----------------|--------------------------------------------------------|
| Applies to | `Service` (LoadBalancer), `Ingress`, `Gateway` routes |
| Default | `"Created by the ngrok-operator"` |

### `ngrok.com/metadata`

Sets arbitrary key-value metadata on the ngrok endpoint resource. Value is a JSON object string that is parsed into `map[string]string`. Merged with operator-level ``ngrok.metadata``; annotation keys take precedence on conflict.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize inline code formatting for ngrok.metadata.

Line 71 uses double backticks around ngrok.metadata; single backticks are sufficient and render consistently.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/annotations.md` at line 71, Replace the double-backtick inline code
formatting around ngrok.metadata with single backticks to normalize inline code
style in the sentence describing metadata merging; locate the occurrence of
``ngrok.metadata`` in the specs/annotations.md text and change it to
`ngrok.metadata`.


| Detail | Value |
|-----------------|--------------------------------------------------------|
| Applies to | `Service` (LoadBalancer), `Ingress`, `Gateway` routes |
| Default | `{"owned-by": "ngrok-operator"}` |

### `ngrok.com/bindings`

Controls traffic visibility for an endpoint. Comma-separated list of binding types.

| Detail | Value |
|-----------------|--------------------------------------------------------|
| Applies to | `Service` (LoadBalancer), `Ingress`, `Gateway` routes |
| Allowed values | `public`, `internal`, `kubernetes` |
| Default | (none — uses ngrok platform default) |

## Internal Annotations (set by the operator)

### `ngrok.com/computed-url`

Set by the Service LoadBalancer controller as the single source of truth for the externally reachable URL. Users should not set this annotation.

| Detail | Value |
|-----------------|--------------------------------------------------------|
| Set on | `Service` (LoadBalancer) |
| Example | `tcp://5.tcp.ngrok.io:12345`, `tls://example.com:443` |

See: [controllers/service.md](controllers/service.md)
75 changes: 75 additions & 0 deletions specs/authentication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Authentication

## Overview

The ngrok-operator authenticates with the ngrok API using two credentials:
Comment thread
jonstacks marked this conversation as resolved.
Outdated

- **API Key** (`NGROK_API_KEY`): Used for ngrok API access to manage resources (domains, endpoints, IP policies, etc.)
- **Auth Token** (`NGROK_AUTHTOKEN`): Used for ngrok agent authentication to establish tunnels

Both credentials are required for the operator to function.

## Credential Storage

Credentials are stored in a Kubernetes Secret in the operator's namespace. The Secret contains two keys:

| Secret Key | Description |
|--------------|------------------------------------|
| `API_KEY` | ngrok API key |
| `AUTHTOKEN` | ngrok auth token |

## Providing Credentials

### Via Helm Values (recommended for initial setup)

When installing via Helm, credentials can be provided directly:

```yaml
credentials:
apiKey: "<your-api-key>"
authtoken: "<your-authtoken>"
```

When both values are provided, the Helm chart creates a Secret with the generated name `<release-name>-ngrok-operator-credentials` (or the name specified in `credentials.secret.name`).

### Via Pre-existing Secret

If credentials are managed externally (e.g., by a secrets manager), create the Secret before installing the operator:

```yaml
apiVersion: v1
kind: Secret
metadata:
name: my-ngrok-credentials
namespace: <operator-namespace>
type: Opaque
data:
API_KEY: <base64-encoded-api-key>
AUTHTOKEN: <base64-encoded-authtoken>
```

Then reference it in Helm values:

```yaml
credentials:
secret:
name: my-ngrok-credentials
```

When `credentials.apiKey` and `credentials.authtoken` are both empty, the Helm chart does not create a Secret and expects the named Secret to already exist.

## Credential Consumption

The operator pods mount the Secret as environment variables:

- The **api-manager** (main controller) uses `NGROK_API_KEY` for all ngrok API operations.
- The **agent-manager** uses `NGROK_AUTHTOKEN` for establishing agent tunnels.
- The **bindings-forwarder** uses `NGROK_AUTHTOKEN` for its connections.

## mTLS for Bindings

When the bindings feature is enabled, the operator generates a self-signed TLS certificate and creates a Certificate Signing Request (CSR) with the ngrok API. This certificate is stored in a Secret (default name: `default-tls`) in the operator's namespace and is used for mTLS communication between the bindings forwarder and ngrok's ingress endpoint.

## One-Click Demo Mode

When `oneClickDemoMode: true` is set, the operator starts without requiring credentials. It will report as Ready but will not actually connect to the ngrok API. This mode is intended for demonstration purposes only.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this deserves its own page as well. I wouldn't say for example that its for demonstration purposes only. My understanding was that it was for marketplaces that require a way of installing an operator without any user provided indput.

This doc reads kind of like human documentation vs a requirement bullet that is like "The operator MUST have the 1 of the 3 helm properties provided: demo mode, credential name, or the 2 credentials". maybe its just the terse-ness of old specs/reqs I was used to vs these. I'm not sure how to suggest a solid change though, maybe somethign we can iterate on. Or if we end up finding a good format we like we can put it into an agent.md, custom agent, or skill. There probably are open source skills for writing specs, so we could try taking these and transforming them into different flavors and seeing what we like

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that it was implemented for marketplace installs. If we ever moved to an operator model, where agents are CRDs, like Datadog's, we could largely drop this.

I think to resolve this, we just remove that is for demonstration purposes only from the spec. The reason doesn't much matter for the spec. Just that when setting oneClickDemoMode that it doesn't fail for lack of credentials, becomes ready, and waits until credentials are supplied.

60 changes: 60 additions & 0 deletions specs/controllers/agentendpoint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# AgentEndpoint Controller
Comment thread
jonstacks marked this conversation as resolved.

## Executive Summary

The AgentEndpoint controller reconciles `AgentEndpoint` resources by creating and managing ngrok agent endpoints. It ensures associated domains exist, resolves traffic policies, fetches client certificates, and keeps endpoint status up to date.

## Watches

| Resource | Relation | Predicate |
|-----------------------|------------|----------------------------------------------|
| `AgentEndpoint` | Primary | AnnotationChanged or GenerationChanged |
| `TrafficPolicy` | Secondary | Indexed by `spec.trafficPolicyName`; DELETE events filtered |
| `Secret` | Secondary | Indexed by client certificate refs; DELETE events filtered |
| `Domain` | Owned | All events |

## Reconciliation Flow

1. Ensure the associated Domain exists via `DomainManager.EnsureDomainExists()`.
2. Fetch the traffic policy (by reference or inline).
3. Fetch client certificates from referenced Secrets.
4. Create or update the ngrok agent endpoint via `AgentDriver`.
5. Update status conditions and fields.
6. Call `ReconcileStatus()`.

## Created Resources

- ngrok agent endpoint (via AgentDriver API)
- `Domain` CR (via DomainManager)

## Status

| Field | Description |
|--------------------------|------------------------------------------|
| `assignedURL` | The URL assigned by ngrok |
Comment thread
jonstacks marked this conversation as resolved.
Outdated
| `attachedTrafficPolicy` | `"none"`, `"inline"`, or policy ref name |
| `domainRef` | Reference to the associated Domain CR |

## Conditions

| Type | Description |
|--------------------|--------------------------------------------------|
| `EndpointCreated` | Whether the ngrok agent endpoint was created |
| `TrafficPolicy` | Whether the traffic policy was applied |
| `DomainReady` | Whether the associated Domain is ready |
| `Ready` | Aggregates all conditions and domain status |

## Events

- `Creating` / `Created`
- `Updating` / `Updated`
- `Deleting` / `Deleted`
- Error variants for each operation

## Error Handling

| Error | Behavior |
|--------------------------------|------------------------|
| `ErrInvalidTrafficPolicyConfig`| No requeue |
| `ErrDomainNotReady` | Requeue after 10s |
| Default | Via `CtrlResultForErr` |
34 changes: 34 additions & 0 deletions specs/controllers/bindings-forwarder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Bindings Forwarder Controller

## Executive Summary

The Bindings Forwarder controller manages TCP listeners for `BoundEndpoint` resources. It bridges incoming connections from the upstream service through mTLS to the ngrok ingress endpoint.

## Watches

| Resource | Relation | Predicate |
|------------------|----------|-----------|
| `BoundEndpoint` | Primary | Manual setup (unmanaged controller) |

## Reconciliation Flow

1. Fetch the KubernetesOperator CR for binding configuration and the ingress endpoint address.
2. Fetch the TLS Secret for mTLS authentication.
3. Create a TLS dialer with the client certificate.
4. Listen on the allocated port for the BoundEndpoint.
5. For each incoming connection:
- Look up the source Pod by client IP (via field indexer on `status.podIP`).
- Upgrade the connection to a binding connection via mux protocol.
- Join the client connection with the ngrok ingress endpoint connection.
6. Close the listener when the BoundEndpoint is deleted.

## Created Resources

- TCP listeners (in-process, not Kubernetes resources)

## Notes

- This controller runs in the bindings-forwarder deployment, not the api-manager.
- Leader election is disabled for the bindings-forwarder.
- The controller uses `statusID()` that returns namespace/name (always non-empty), so the Update handler is always used.
- See [features/bindings.md](../features/bindings.md) for the full feature overview.
Loading
Loading