Document health-probe status-code contract and Control Plane probes (#4053)#4063
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughDocumentation for the node-renderer health-checks page gains two additions: a status-code contract section with a table and ChangesNode Renderer Health-Check Documentation
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis is a documentation-only PR that adds two new sections to
Confidence Score: 4/5Documentation-only change with no runtime risk; the new sections are accurate against the worker.ts source, and both findings are editorial gaps rather than incorrect guidance. The status-code contract table and callout are correct and match the worker.ts implementation exactly. The CPLN YAML snippet, however, promises 'readiness / liveness' in its comment but omits the liveness block, leaving readers with an incomplete configuration. The pre-existing h2c section example still hands users docs/oss/building-features/node-renderer/health-checks.md — the CPLN snippet and the h2c section example both need small follow-up edits. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Container starts] --> B{Renderer process up?}
B -- No --> DEAD[Container never starts]
B -- Yes --> C{Probe type?}
C -- "curl --fail /health or /info" --> D["Always 200 ✓"]
D --> READY[Probe passes — container ready]
C -- "curl --fail /ready (no warm-up path)" --> E{Bundle compiled?}
E -- "Yes (200)" --> READY
E -- "No (503 waiting_for_bundle)" --> F[Probe fails — non-zero exit]
F --> DEAD2[Container never becomes ready — deadlock]
C -- "curl --fail /ready (with warm-up path)" --> G[Warm-up delivers first render]
G --> E
style DEAD fill:#f66,color:#fff
style DEAD2 fill:#f66,color:#fff
style READY fill:#6c6,color:#fff
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[Container starts] --> B{Renderer process up?}
B -- No --> DEAD[Container never starts]
B -- Yes --> C{Probe type?}
C -- "curl --fail /health or /info" --> D["Always 200 ✓"]
D --> READY[Probe passes — container ready]
C -- "curl --fail /ready (no warm-up path)" --> E{Bundle compiled?}
E -- "Yes (200)" --> READY
E -- "No (503 waiting_for_bundle)" --> F[Probe fails — non-zero exit]
F --> DEAD2[Container never becomes ready — deadlock]
C -- "curl --fail /ready (with warm-up path)" --> G[Warm-up delivers first render]
G --> E
style DEAD fill:#f66,color:#fff
style DEAD2 fill:#f66,color:#fff
style READY fill:#6c6,color:#fff
|
| ```yaml | ||
| # Control Plane workload — readiness / liveness as a Command probe | ||
| readiness: | ||
| exec: | ||
| command: | ||
| - curl | ||
| - -sf | ||
| - --max-time | ||
| - '3' | ||
| - --http2-prior-knowledge | ||
| - http://localhost:3800/health | ||
| ``` |
There was a problem hiding this comment.
Missing liveness probe in CPLN snippet
The prose says "Use /health (always 200) for liveness and readiness by default" and the comment on the code block says "readiness / liveness as a Command probe", but only the readiness key is shown. A user copying this snippet will configure readiness without liveness, contrary to the intent described in the surrounding text. A corresponding liveness block pointing at /health should appear alongside readiness in the example.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c93a5ec155
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| ```yaml | ||
| # Control Plane workload — readiness / liveness as a Command probe | ||
| readiness: |
There was a problem hiding this comment.
Use Control Plane probe field names
For Control Plane manifests this block needs to be attached to the workload container as readinessProbe/livenessProbe; the rest of our CPLN docs use spec.containers[].readinessProbe (for example, docs/oss/deployment/docker-deployment.md:365), and CPLN's probe schema is Kubernetes-style with exec/httpGet/tcpSocket/grpc inside that probe. As written, a user who copies this readiness: object into a workload manifest gets no valid Command probe, so the advertised readiness/liveness check will not run.
Useful? React with 👍 / 👎.
The "is this PR's CI ready?" rule was restated in prose across pr-batch, adversarial-pr-review, and pr-processing: run `gh pr checks <PR> --required`, fall back to the full `gh pr checks <PR>` list when no required checks exist, ignore cancelled/superseded rows, and treat an empty list as UNKNOWN/not-ready. Encapsulate that rule in `.agents/skills/pr-batch/bin/pr-ci-readiness`, written as pure Ruby (stdlib json + open3, no bundler/gems) mirroring the fetch-pr-review-data template: a PrCiReadiness module of pure module_function verdict helpers plus a Runner class for the CLI. Subprocesses use Open3 array args (no shell); `gh pr checks` stdout is captured despite its non-zero exit on failing/pending checks, with stderr discarded, and the rows are inspected to print a verdict of READY, NOT_READY, or UNKNOWN with failing/pending names and a required_used flag. Flags: --json (default)/--text/--self-check/--help. This replaces the original bash implementation. The helpers already hard-depended on Ruby (the bash versions shelled out to `ruby -rjson`), so going pure Ruby removes the bash<->Ruby seam at zero dependency cost. Output JSON shape, executable path/name, and CLI flags are unchanged, so the SKILL.md / workflow references stay valid and need no edits. The companion test is now minitest (pr-ci-readiness-test.rb, replacing the -test.bash): pure verdict logic is tested directly with canned check arrays, and the CLI/Runner path is exercised through a fake `gh` on PATH. Verified output is byte-for-byte identical to the previous bash version against merged PR #4058 and in-progress PR #4063. rubocop and minitest both pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code ReviewVerdict: Approve with one minor suggestion This is a well-written, technically accurate documentation PR that cleanly resolves issue #4053. The root cause analysis (cold-start What's working well
Minor issueThe CPLN YAML comment says Note on curl HTTP/2 requirementThe CPLN section uses |
| # Control Plane workload — readiness / liveness as a Command probe | ||
| readiness: | ||
| exec: | ||
| command: | ||
| - curl | ||
| - -sf | ||
| - --max-time | ||
| - '3' | ||
| - --http2-prior-knowledge | ||
| - http://localhost:3800/health | ||
| ``` | ||
|
|
There was a problem hiding this comment.
The comment # Control Plane workload — readiness / liveness as a Command probe advertises both probe types, but only readiness is defined in the snippet. A reader following this literally would have liveness unconfigured.
Either show both keys (same curl command, same /health path) or narrow the comment so it doesn't imply liveness is covered:
| # Control Plane workload — readiness / liveness as a Command probe | |
| readiness: | |
| exec: | |
| command: | |
| - curl | |
| - -sf | |
| - --max-time | |
| - '3' | |
| - --http2-prior-knowledge | |
| - http://localhost:3800/health | |
| ``` | |
| ```yaml | |
| # Control Plane workload — readiness as a Command probe | |
| # (apply the same shape to liveness if desired) | |
| readiness: | |
| exec: | |
| command: | |
| - curl | |
| - -sf | |
| - --max-time | |
| - '3' | |
| - --http2-prior-knowledge | |
| - http://localhost:3800/health |
c93a5ec to
7469f72
Compare
Code Review — PR #4063Documentation-only change. The root-cause analysis is correct and the additions are accurate and well-written. A few things worth addressing before merge: 1. Missing
|
| periodSeconds: 5 | ||
| failureThreshold: 3 |
There was a problem hiding this comment.
Missing timeoutSeconds here. If CPLN applies the standard Kubernetes default (1 s), the orchestrator will terminate the curl process before its own --max-time 3 fires, causing every probe to time out even when /health is healthy. Add timeoutSeconds: 5 to match the /ready exec probe pattern used elsewhere in this doc.
| periodSeconds: 5 | |
| failureThreshold: 3 | |
| periodSeconds: 5 | |
| failureThreshold: 3 | |
| timeoutSeconds: 5 |
| periodSeconds: 10 | ||
| failureThreshold: 3 |
There was a problem hiding this comment.
Same issue: timeoutSeconds is missing. Without it, the orchestrator's default (1 s in Kubernetes) races against --max-time 3 and wins, killing the probe before curl finishes.
| periodSeconds: 10 | |
| failureThreshold: 3 | |
| periodSeconds: 10 | |
| failureThreshold: 3 | |
| timeoutSeconds: 5 |
| - -sf | ||
| - --max-time |
There was a problem hiding this comment.
Consider -sfS (--silent --fail --show-error) instead of -sf. The -S/--show-error flag re-enables curl error messages even under --silent, so probe failures produce actionable output in container logs rather than silent non-zero exits. Same applies to the livenessProbe curl below.
| - -sf | |
| - --max-time | |
| - -sfS | |
| - --max-time |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7469f72281
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - --max-time | ||
| - '3' | ||
| - --http2-prior-knowledge |
There was a problem hiding this comment.
Add timeoutSeconds to the CPLN exec probes
For Control Plane command probes, timeoutSeconds defaults to 1s (matching the Kubernetes-style default documented in docs/oss/building-features/node-renderer/js-configuration.md:480), but this snippet lets curl wait up to 3s without raising the probe timeout. A copied manifest on a loaded renderer can have CPLN kill the exec probe after 1s, so the readiness/liveness check fails before the intended curl deadline; add timeoutSeconds: 5 to both probes as in the earlier exec-probe example.
Useful? React with 👍 / 👎.
|
Addressed the current-head review feedback in c871729 (on top of the corrected CPLN snippet in 7469f72):
Also regenerated |
c871729 to
d245889
Compare
Code Review — PR #4063Documentation-only change. The status-code claims are verified against Issues foundMinor (2)
Observations (non-blocking)
|
| | --------- | ---------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | | ||
| | `/health` | Always `200` (process can answer) | **Yes** — never non-2xx. | | ||
| | `/info` | Always `200` (returns Node and renderer versions) | **Yes** — never non-2xx. | | ||
| | `/ready` | `200` once a bundle is compiled; `503` `waiting_for_bundle` until then | **Only with a warm-up path.** Without one, `-f` turns the cold-start `503` into a failed probe. | |
There was a problem hiding this comment.
The cell `503` `waiting_for_bundle` until then reads as if waiting_for_bundle might be an HTTP status text. Consider making the JSON body explicit so it is unambiguous:
| | `/ready` | `200` once a bundle is compiled; `503` `waiting_for_bundle` until then | **Only with a warm-up path.** Without one, `-f` turns the cold-start `503` into a failed probe. | | |
| | `/ready` | `200` once a bundle is compiled; `503` with `{"status":"waiting_for_bundle"}` until then | **Only with a warm-up path.** Without one, `-f` turns the cold-start `503` into a failed probe. | |
| ```yaml | ||
| kind: workload | ||
| spec: | ||
| containers: | ||
| - name: node-renderer | ||
| # ... image, ports, env (RENDERER_ENABLE_HEALTH_ENDPOINTS: 'true') ... | ||
| # Command probe — h2c-aware curl against /health (always 200). | ||
| readinessProbe: | ||
| exec: | ||
| command: | ||
| - curl | ||
| - -sf | ||
| - --max-time | ||
| - '3' | ||
| - --http2-prior-knowledge | ||
| - http://localhost:3800/health | ||
| periodSeconds: 5 | ||
| failureThreshold: 3 | ||
| timeoutSeconds: 5 # exceed curl --max-time 3 so the probe, not the orchestrator, owns the timeout | ||
| livenessProbe: | ||
| exec: | ||
| command: | ||
| - curl | ||
| - -sf | ||
| - --max-time | ||
| - '3' | ||
| - --http2-prior-knowledge | ||
| - http://localhost:3800/health | ||
| periodSeconds: 10 | ||
| failureThreshold: 3 | ||
| timeoutSeconds: 5 # exceed curl --max-time 3 so the probe, not the orchestrator, owns the timeout | ||
| ``` |
There was a problem hiding this comment.
The CPLN YAML omits a startupProbe, while the renderer probe example in the linked docker-deployment.md includes one. Since this section probes /health (always 200), liveness cannot misfire during a cold start, so omitting startupProbe is safe — but the omission is silent. Consider either:
Option A — add an inline comment explaining the omission:
| ```yaml | |
| kind: workload | |
| spec: | |
| containers: | |
| - name: node-renderer | |
| # ... image, ports, env (RENDERER_ENABLE_HEALTH_ENDPOINTS: 'true') ... | |
| # Command probe — h2c-aware curl against /health (always 200). | |
| readinessProbe: | |
| exec: | |
| command: | |
| - curl | |
| - -sf | |
| - --max-time | |
| - '3' | |
| - --http2-prior-knowledge | |
| - http://localhost:3800/health | |
| periodSeconds: 5 | |
| failureThreshold: 3 | |
| timeoutSeconds: 5 # exceed curl --max-time 3 so the probe, not the orchestrator, owns the timeout | |
| livenessProbe: | |
| exec: | |
| command: | |
| - curl | |
| - -sf | |
| - --max-time | |
| - '3' | |
| - --http2-prior-knowledge | |
| - http://localhost:3800/health | |
| periodSeconds: 10 | |
| failureThreshold: 3 | |
| timeoutSeconds: 5 # exceed curl --max-time 3 so the probe, not the orchestrator, owns the timeout | |
| ``` | |
| ```yaml | |
| kind: workload | |
| spec: | |
| containers: | |
| - name: node-renderer | |
| # ... image, ports, env (RENDERER_ENABLE_HEALTH_ENDPOINTS: 'true') ... | |
| # No startupProbe needed: /health always returns 200, so liveness cannot | |
| # misfire during cold start. Add one if you switch the paths to /ready. | |
| # Command probe — h2c-aware curl against /health (always 200). | |
| readinessProbe: | |
| exec: | |
| command: | |
| - curl | |
| - -sf | |
| - --max-time | |
| - '3' | |
| - --http2-prior-knowledge | |
| - http://localhost:3800/health | |
| periodSeconds: 5 | |
| failureThreshold: 3 | |
| timeoutSeconds: 5 # exceed curl --max-time 3 so the probe, not the orchestrator, owns the timeout | |
| livenessProbe: | |
| exec: | |
| command: | |
| - curl | |
| - -sf | |
| - --max-time | |
| - '3' | |
| - --http2-prior-knowledge | |
| - http://localhost:3800/health | |
| periodSeconds: 10 | |
| failureThreshold: 3 | |
| timeoutSeconds: 5 # exceed curl --max-time 3 so the probe, not the orchestrator, owns the timeout |
**Option B** — add a `startupProbe` for consistency with the sibling doc (a `tcpSocket` probe is sufficient, matching the Kubernetes section pattern).
Either works; the suggestion above just adds the comment.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d245889a01
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Control Plane exposes two relevant probe shapes: an **HTTP** probe and a **Command** (exec) probe. The HTTP probe is | ||
| HTTP/1.1 and **cannot speak the renderer's h2c listener** (see [h2c](#h2c-why-httpget-probes-do-not-work)), so it | ||
| always fails against these endpoints — use a **Command** probe with an h2c-aware curl instead. Command probes run | ||
| inside the container, so the default `localhost` binding works and no `0.0.0.0` host is required. |
There was a problem hiding this comment.
Qualify localhost guidance for separate workloads
In the separate node-renderer Control Plane workload topology, Rails connects over the workload service address, not from inside this container; the existing deployment guide says that case must use 0.0.0.0 for the renderer host (docs/oss/building-features/node-renderer/container-deployment.md:218-220). As written, this sentence can lead users to keep the default localhost binding because the exec probe passes, while real Rails→renderer traffic to the separate workload cannot connect.
Useful? React with 👍 / 👎.
Code Review — Health-probe status-code contract and Control Plane probesOverall: Well-written, directly actionable docs fix for issue #4053. The root cause analysis is accurate, the status-code table is immediately useful, and the CPLN section correctly identifies the h2c/HTTP/1.1 incompatibility. A few things to address before merging: Strengths
Notable gap: no
|
| - name: node-renderer | ||
| # ... image, ports, env (RENDERER_ENABLE_HEALTH_ENDPOINTS: 'true') ... | ||
| # Command probe — h2c-aware curl against /health (always 200). | ||
| readinessProbe: |
There was a problem hiding this comment.
The CPLN YAML example is missing a startupProbe, which the Kubernetes section includes to shield liveness during boot. Without it (and without initialDelaySeconds on liveness), a slow-starting renderer can be killed by the liveness probe before it finishes booting.
CPLN supports startupProbe in the same Kubernetes-style format. The tcpSocket form from the Kubernetes example can't be used here (no 0.0.0.0 binding), but the exec form against /health works:
| readinessProbe: | |
| startupProbe: | |
| exec: | |
| command: | |
| - curl | |
| - -sf | |
| - --max-time | |
| - '3' | |
| - --http2-prior-knowledge | |
| - http://localhost:3800/health | |
| initialDelaySeconds: 10 | |
| periodSeconds: 5 | |
| failureThreshold: 12 # 10 + (5 × 12) = 70 s; tune to your cold-start window | |
| timeoutSeconds: 5 | |
| # Command probe — h2c-aware curl against /health (always 200). | |
| readinessProbe: |
This mirrors the guidance in the Kubernetes section ("Startup: shield liveness while the renderer boots") and prevents the liveness probe from racing against cold start.
Issue #4053 reported that the recommended renderer readiness probe `curl --http2-prior-knowledge -fsS http://localhost:3800/ready` breaks container startup. Root cause: `-f`/`--fail` exits non-zero on any HTTP status >= 400, and `/ready` returns 503 (waiting_for_bundle) during the cold-start window until the answering worker compiles its first bundle. Pointing a --fail probe at /ready with no warm-up path therefore deadlocks startup. This is documented gating behavior, but the docs lacked an explicit per-endpoint status-code contract and Control Plane guidance. Changes to health-checks.md: - Add a Status-Code Contract section: /health and /info are always 200 (safe with --fail); /ready is 200/503 (only safe with --fail when a warm-up path exists). Includes a callout explaining why `curl -fsS .../ready` breaks startup and what to use instead. - Add a Control Plane (CPLN) section: CPLN offers HTTP and Command (exec) probes; the HTTP probe cannot speak the renderer's h2c listener, so use a Command probe with h2c-aware curl against /health by default, and only switch to /ready with a warm-up path. Docs-only change; no CHANGELOG entry per changelog guidelines. Fixes #4053 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address review threads on the Control Plane snippet in docs/oss/building-features/node-renderer/health-checks.md: - Use the Kubernetes-style readinessProbe / livenessProbe container fields that Control Plane actually accepts, matching the existing Control Plane deployment docs, instead of a flat top-level `readiness:` key that would not configure a valid probe. - Show both readinessProbe and livenessProbe (the comment previously advertised liveness but only defined readiness). Regenerate llms-full.txt so the docs change is reflected (fixes the stale check-llms-full CI gate). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add `timeoutSeconds: 5` to the Control Plane readinessProbe and livenessProbe examples so the curl `--max-time 3` budget, not the orchestrator's default 1s probe timeout, owns the deadline — matching the existing /ready exec probe pattern in this doc. Without it the orchestrator can kill curl before it finishes, flaking a healthy probe. Regenerate llms-full.txt. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
d245889 to
40ae833
Compare
Review: Health-Probe Status-Code Contract and CPLN DocsOverall: Clean, well-written documentation that directly addresses the root cause in #4053. The status-code contract table is concise and accurate, the blockquote callout puts the deadlock scenario front-and-center, and the CPLN section correctly explains why HTTP probes fail (h2c incompatibility) with a working exec probe example. A few things worth discussing: Missing
|
|
|
||
| ## Status-Code Contract | ||
|
|
||
| Probe tooling that uses `curl --fail` / `-f` (which `-sf` and `-fsS` both include) exits non-zero on any HTTP status |
There was a problem hiding this comment.
Mathematically correct (503 ≥ 400), but readers who think in 4xx vs 5xx buckets may not immediately realize this applies to /ready's 503. A small parenthetical would prevent that misread:
| Probe tooling that uses `curl --fail` / `-f` (which `-sf` and `-fsS` both include) exits non-zero on any HTTP status | |
| `>= 400`. Whether `--fail` is safe therefore depends on which endpoint you probe: |
Consider: `>= 400` (4xx and 5xx) — or replace with "any non-2xx/3xx status" which maps more directly to how developers think about "failure" HTTP responses.
| livenessProbe: | ||
| exec: | ||
| command: | ||
| - curl | ||
| - -sf | ||
| - --max-time | ||
| - '3' | ||
| - --http2-prior-knowledge | ||
| - http://localhost:3800/health | ||
| periodSeconds: 10 | ||
| failureThreshold: 3 | ||
| timeoutSeconds: 5 # exceed curl --max-time 3 so the probe, not the orchestrator, owns the timeout |
There was a problem hiding this comment.
The Kubernetes section includes a startupProbe that shields the liveness probe while the container boots (up to 70 s via initialDelaySeconds: 10 + failureThreshold: 12 × periodSeconds: 5). Without one here, the livenessProbe starts immediately and allows only failureThreshold: 3 × periodSeconds: 10 = 30 s before a restart — potentially too tight during image pulls or slow Fastify startup.
If CPLN supports startupProbe (Kubernetes-style workload definitions usually do), consider adding:
startupProbe:
exec:
command:
- curl
- -sf
- --max-time
- '3'
- --http2-prior-knowledge
- http://localhost:3800/health
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 12 # 10 + (5 × 12) = 70 s total; tune to your cold-start time
timeoutSeconds: 5If CPLN does not support startupProbe, a note explaining that and recommending a higher failureThreshold on the liveness probe would prevent readers from silently under-allocating startup time.
* origin/main: (40 commits) feat(pro): use built-in Rails i18n compiler for React Intl demo (#4128) Fix pr-merge-ledger UTF-8 crash under non-UTF-8 locale (#4123) Add canonical AI-agent prompts source (prompts.yml) (#4124) Local benchmark runner: raise server-boot timeout for slower machines (#4073) (#4125) Docs(generator): note Pro production devtool for source-mapped SSR stacks (#3893) (#4113) docs: add "Consuming an Unreleased Build" guide and fix pnpm git-subdir syntax (#4117) Address deferred AI-review feedback on PR-helper scripts (#4069) (#4105) Wrap generated demo file paths in onboarding page (Part 1 of #4062) (#4107) fix(ci): build bundle-size base from PR merge commit's first parent (#4110) Add internal RSC architecture deep-dive docs (RoR Pro vs Next.js) (#4006) Disable noisy automatic benchmark regression issue filing (#4071) (#4116) Release-train branching + phase-tiered merge gating (beta/RC/final) (#4018) Fix Webpack dependency selection in install generator (#4109) Document health-probe status-code contract and Control Plane probes (#4053) (#4063) Local dedicated-hardware benchmark runner (#4073) (#4088) docs(tooling): surface SVG diagram alt text in generated llms-full files (#4087) docs(agents): codify review-loop convergence + local/CI parity in PR-batch workflow (#4101) Split RenderFunction: drop the legacy renderer arm (#4096) Add OSS hydrate_on scheduling (#4037) Docs: fix stale evaluate-issue gate cross-reference (#3910) (#4104) ...
Why
Issue #4053: the recommended renderer readiness probe
curl --http2-prior-knowledge -fsS http://localhost:3800/readybreaks container startup. The container never becomes ready; removing-fsS"fixes" it (but then the probe always passes, defeating its purpose).Root cause (confirmed in
worker.ts):-f/--failmakes curl exit non-zero on any HTTP status>= 400, and/readyreturns503 {"status":"waiting_for_bundle"}during the cold-start window until the answering worker compiles its first bundle. A--failprobe against/readywith no warm-up path therefore deadlocks startup. This is intended gating behavior — but the docs lacked an explicit per-endpoint status-code contract and any Control Plane guidance, so the reporter hit the deadlock without a clear answer.Endpoint contract (from
worker.ts):/health→ always200/info→ always200/ready→200once a bundle is compiled, else503What changed
Documentation only —
docs/oss/building-features/node-renderer/health-checks.md:curl -fsS .../readybreaks startup, and what to probe instead (/healthor/infofor an always-passes probe; reserve--failagainst/readyfor setups with a warm-up path)./healthby default, switching to/readyonly with a warm-up path.This addresses all three of the issue's requests: a probe command that works, the
/ready+/infostatus-code contract, and the CPLN-specific probe options.Validation
npx prettier --checkpasses on the changed file..claude/docs/changelog-guidelines.md(docs fixes excluded).Fixes #4053
🤖 Generated with Claude Code
Note
Low Risk
Documentation-only changes with no runtime, security, or deployment code modifications.
Overview
Documents why
curl --failagainst/readycan deadlock container readiness during cold start, and how to probe safely.Adds a Status-Code Contract table for
/health,/info, and/ready(including when503 waiting_for_bundleis expected) plus guidance to use/healthor/infofor probes that must pass once the process is up, and/readywith--failonly when a warm-up path compiles the first bundle.Adds a Control Plane (CPLN) section explaining that HTTP probes cannot use the renderer’s h2c listener, with example
execCommand probes using h2c-awarecurlagainst/healthfor readiness and liveness. The same content is mirrored inllms-full.txt.Reviewed by Cursor Bugbot for commit 40ae833. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by CodeRabbit
/readymay return503during cold startcurl --fail/-f/-fsS, warning against using/readyas a failing readiness gate without a warm-up pathAgent Merge Confidence (coordinator)
Merged by Claude Code coordinator under explicit maintainer (justin808) delegation — "merge authorization if confident + documented."
docs/oss/.../health-checks.md) + regeneratedllms-full.txt.llms-full.txtserialization);llms-full.txtregenerated vianode script/generate-llms-full.mjs— regeneration produced zero diff and the--checkguard passes (auto-merge was correct)./health→200,/ready→200/503waiting_for_bundleconfirmed againstpackages/react-on-rails-pro-node-renderer/src/worker.ts; CPLN probe schema confirmed againstreact_on_rails_pro/.controlplane/rails.yml.timeoutSeconds" and greptile "missing liveness probe" flags are stale/incorrect — the CPLN snippet already containstimeoutSeconds: 5on bothreadinessProbeandlivenessProbe.startupProbeomission is intentional and safe (probes/health, always 200; cold start cannot misfire). The-sfSand localhost-qualification notes are advisory/misreads. No actionable must-fix remained.mergeStateStatus: CLEAN, all checks pass/skip. Merge ledger sole hard violationunknown_review_decision— overridden by maintainer merge delegation. Changelognot_user_visible.llms-full.txtand must rebase+regenerate after this merge.