fix(windows): make OpenCode reliable on Windows — retry spurious spawnSync ETIMEDOUT (#46) by Cannon07 · Pull Request #85 · Cannon07/code-preview.nvim

Cannon07 · 2026-06-11T07:04:25Z

Summary

OpenCode's cross-platform plumbing already shipped (the index.ts Windows branch that runs hook-entry.ps1 via PowerShell, the libuv-based installer, the shared PS shim, and the platform.is_absolute resolver fix from #81 which the opencode normaliser shares). In practice, though, OpenCode-on-Windows dropped files under concurrent, multi-file proposals — the most common OpenCode pattern. This PR fixes that last gap.

The bug

When OpenCode fires hooks for a concurrent batch (e.g. several file writes in one turn), the first hook's execSync threw ETIMEDOUT after ~15 ms — not the 15 s timeout — so that file silently got no diff and no neo-tree marker. The rest of the batch succeeded.

Confirmed on a real Windows box across several 5-file runs: the first file always failed fast (ETIMEDOUT/SIGTERM, attempt 1, ~12–19 ms) while files 2–5 succeeded.

Root cause

A Node spawnSync behaviour: its timeout deadline is derived from libuv's cached loop time, which is refreshed only once per loop iteration. The first spawnSync that runs right after async work — here, the awaited enqueueHook in the tool hooks — sees a stale "now", so now + timeout is already in the past and libuv SIGTERMs the child the instant it spawns. Unix execs the .sh directly and doesn't hit this the same way, which is why it's Windows-only.

The fix (`backends/opencode/index.ts`)

runHook is now async and retries a spurious ETIMEDOUT — one that returns faster than SPURIOUS_TIMEOUT_MS (2 s) — but awaits a turn of the event loop first so libuv refreshes its cached clock. A synchronous retry would re-read the same stale value and fail again (which is exactly why the next hook in a burst always succeeds).
A genuine ~15 s timeout is not retried, so hang-protection is preserved.
enqueueHook awaits the async runHook, so the existing send-order serialization is unchanged.
Also documents that OpenCode's after-hook carries tool args on input (the before-hook on output) — confirmed against the live API while debugging.

Validation

End-to-end on a real Windows box with OpenCode, captured via temporary transport-layer logging (since removed):

the retry fired 7× across the runs, all recovered on attempt 1;
every 5-file burst now delivers all 5 pre + 5 post hooks to the shim (previously the first file was dropped);
single-file edits/writes/deletes, post-tool clearing, and absolute-path handling all behave correctly.

Unix is unaffected — the retry only triggers on a fast ETIMEDOUT, which the direct .sh exec path doesn't produce.

Follow-ups (filed, intentionally out of scope here)

Bash post-tool clears neo-tree markers globally, wiping concurrent Bash command markers #83 — bug: Bash post-tool clears neo-tree markers globally (wipes concurrent Bash command markers). Surfaced via OpenCode; backend-agnostic.
Single shared neo-tree across diff tabs, with active-tab file highlight #84 — enhancement: single shared neo-tree across diff tabs + active-tab highlight.
A transport-layer debug-logging facility behind CODE_PREVIEW_DEBUG (the instrumentation that made this debuggable, productized) — to be its own PR.

🤖 Generated with Claude Code

…in (#46) When OpenCode fires hooks for a concurrent batch (e.g. multiple file writes in one turn), the FIRST hook's execSync threw ETIMEDOUT after ~15ms — not the 15s timeout — so that file silently got no diff or neo-tree marker. Confirmed on a real Windows box: across several 5-file runs the first file always failed fast (ETIMEDOUT/SIGTERM, attempt 1, ~12-19ms) while the rest succeeded. Root cause is a Node spawnSync behaviour: its timeout deadline is derived from libuv's *cached* loop time, refreshed once per loop iteration. The first spawnSync right after async work (the awaited enqueueHook) sees a stale "now", so `now + timeout` is already in the past and libuv SIGTERMs the child the instant it spawns. Unix execs the .sh directly and doesn't hit this the same way (it's why the bug is Windows-only). Fix: make runHook async and retry a *spurious* ETIMEDOUT (ETIMEDOUT that returns faster than SPURIOUS_TIMEOUT_MS), but `await` a turn of the event loop first so libuv refreshes its cached time — a synchronous retry would re-read the same stale value (which is exactly why the *next* hook in a burst always succeeds). A genuine ~15s timeout is not retried, so hang-protection is preserved. enqueueHook now awaits the async runHook, so ordering is unchanged. Validated: retry fired 7x across the runs, all recovered on attempt 1; every 5-file burst now delivers all 5 pre/post hooks to the shim. Also documents that OpenCode's after-hook carries tool args on `input` (the before-hook on `output`) — confirmed against the live API. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Cannon07 · 2026-06-11T17:25:28Z

Review — OpenCode reliability on Windows

Excellent diagnosis and a correctly-targeted, well-scoped fix. The root cause (libuv's cached loop time making spawnSync's deadline already-expired on the first post-async spawn) is sophisticated, and yielding the event loop before retry directly addresses it. CI green; happy path covered on Unix, the retry/burst behavior validated manually on Windows.

Why it's correct

The yield is the crux. A synchronous retry would re-read the same stale libuv clock and fail identically — await setImmediate(...) before continue forces libuv to refresh its cached "now", which is why the next hook in a burst always succeeded.
Spurious vs. genuine discrimination preserves hang protection. Gating on elapsed < SPURIOUS_TIMEOUT_MS (2 s) means a real ~15 s hang is not retried — it still abstains and logs.
Bounded and total. Every path inside the loop returns; no unbounded retry, no fall-through.
Serialization preserved. hookQueue.then(async () => { await fn() }) resolves only after runHook fully resolves, so the setImmediate yields advance libuv/timers but don't let a later hook start early — send-order holds.
Unix genuinely unaffected — the retry only triggers on a fast ETIMEDOUT, which the direct .sh exec path doesn't produce.

CI coverage boundary (clarification, not a defect)

The refactor's happy path is covered on Unix CI: tests/backends/opencode/harness.ts imports the real index.ts and drives the hooks, so the async change didn't regress invocation/ordering.
The two things that matter most — the spurious-timeout retry branch (a Windows libuv quirk) and the concurrent multi-file burst (the harness fires one op at a time) — are inherently un-CI-able and rest on the manual Windows validation. That's the right evidence for this class of bug.

One question before merge

The retry decision keys only on err.code === "ETIMEDOUT", but the PR description notes the failure manifests as ETIMEDOUT/SIGTERM, and the logging branch still checks both:

const spurious = !!err && err.code === "ETIMEDOUT" && elapsed < SPURIOUS_TIMEOUT_MS

In practice Node's timeout-kill sets code: 'ETIMEDOUT' (and 7/7 recovery confirms it matches here), but Node has historically also surfaced timeout-kills as signal: 'SIGTERM' with a null code on some platforms — in which case the detector would miss and the file would silently drop again, reintroducing the bug. Since it's still gated by elapsed < 2 s, widening costs nothing on the genuine-hang path:

const spurious = !!err && (err.code === "ETIMEDOUT" || err.signal === "SIGTERM") && elapsed < SPURIOUS_TIMEOUT_MS

Was the SIGTERM case deliberately excluded, or is it safe to widen to match the original logging check?

Minor / optional

No regression guard for the retry logic. Hard to unit-test a libuv quirk, but harness.ts could grow a fault-injection action (mock execSync to throw one fast ETIMEDOUT then succeed) to assert recovery — cheap insurance against someone later "simplifying" the retry away. Given how painful this was to find, a guard has real value.
Log message lost its specificity ("timed out after 15s" → "timed out"). Including elapsed ms would distinguish a real hang from an exhausted-retry case in the field.
The comment documenting after-hook args on input vs before-hook on output is a good capture of a live-API asymmetry that would bite later.

Net: strong PR — the only thing I'd genuinely want resolved is the SIGTERM question; everything else is optional hardening.

…eout retry (#46) Review follow-up. The retry keyed only on `err.code === "ETIMEDOUT"`, but Node's timeout-kill can surface as `signal: 'SIGTERM'` with a null code on some platforms — there the spurious-timeout would be missed and the first hook of a concurrent burst would silently drop again. The logging branch already checked both, so the retry detection was narrower than the logging for no reason. Unify on a single `timedOut` predicate (code ETIMEDOUT OR signal SIGTERM), still gated on `elapsed < SPURIOUS_TIMEOUT_MS` so a genuine ~15s hang is never retried. Also restore elapsed ms in the timeout log so a real hang is distinguishable from exhausted spurious retries. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Review follow-up (optional hardening). The Windows libuv quirk that makes the first post-async spawnSync spuriously time out can't be reproduced on CI, so the retry was previously only covered by manual Windows validation — fragile against a later "simplification". Extract the retry loop into an exported, injectable `runWithSpuriousRetry(run, label)` (production behaviour unchanged — runHook just passes the execSync call as `run`), and add a unit guard that drives it with a fake `run`: - fast ETIMEDOUT recovers on retry (the core case) - fast SIGTERM-only recovers (guards the platform-variance fix) - a non-timeout error is NOT retried - a persistent fast timeout is bounded at MAX_HOOK_ATTEMPTS (no infinite loop) - success on first attempt runs once retry_test.ts imports the real index.ts and asserts call counts; test_retry.sh drives it via bun/npx tsx (no nvim needed) and skips cleanly when neither is present. Uses pathToFileURL so the dynamic import also works on Windows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Cannon07 and others added 2 commits June 11, 2026 22:59

Cannon07 merged commit 01ca749 into main Jun 11, 2026
3 checks passed

Cannon07 deleted the feat/windows-opencode branch June 11, 2026 19:25

Cannon07 mentioned this pull request Jun 12, 2026

Request Windows 11 support #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(windows): make OpenCode reliable on Windows — retry spurious spawnSync ETIMEDOUT (#46)#85

fix(windows): make OpenCode reliable on Windows — retry spurious spawnSync ETIMEDOUT (#46)#85
Cannon07 merged 3 commits into
mainfrom
feat/windows-opencode

Cannon07 commented Jun 11, 2026

Uh oh!

Cannon07 commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Cannon07 commented Jun 11, 2026

Summary

The bug

Root cause

The fix (backends/opencode/index.ts)

Validation

Follow-ups (filed, intentionally out of scope here)

Uh oh!

Cannon07 commented Jun 11, 2026

Review — OpenCode reliability on Windows

Why it's correct

CI coverage boundary (clarification, not a defect)

One question before merge

Minor / optional

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

The fix (`backends/opencode/index.ts`)