Dev Containers: auto-forwarded extension-host agent port storms `docker exec` relays (~15/s), leaking thousands of procs until swap is exhausted

# Dev Containers extension host leaks hundreds of `docker exec` keep-alive shells, exhausting swap

## Summary

A long-running Dev Containers connection leaks `docker exec` "keep-alive" shell processes on the host without ever reaping them. Over ~20 minutes the extension-host utility process spawned **792** such `docker exec` invocations, leaving **~525 live on the host** simultaneously, with a matching **~528 `vscode-remote-containers-server-*.js` / `server-main.js` node processes piling up inside the target container**.

None of these are individually large, so they are invisible in `top`/Task Manager sorted by RSS. In aggregate (~1,050 leaked processes out of 1,781 total on the host) they exhausted swap — **7.5 GiB of 8 GiB used** — and made the machine unresponsive, even though physical RAM was not full (40 GiB free at the time).

The leak resolved itself when the connection dropped / the VS Code window reloaded: process count fell from 1,781 → 619 and swap drained from 7.5 GiB → 0.7 GiB with no other action.

## Does this issue occur when all extensions are disabled?

Not yet tested — the leak is intermittent and tied to a long-lived container session, which makes a clean-profile repro slow to trigger. Will update if reproduced.

## Environment

- **VS Code Version:** 1.124.2 (commit `6928394f91b684055b873eecb8bc281365131f1c`, x64)
- **Dev Containers extension:** ms-vscode-remote.remote-containers 0.459.1
- **Local OS:** Ubuntu 24.04.4 LTS, kernel 6.17.0-35-generic
- **Remote / connection type:** Dev Containers (Containers)
- **Docker:** 29.5.3 (build d1c06ef)
- **Container image:** `vsc-prometheus-…` (a workspace dev container)

## Evidence captured during the incident

Process aggregation by command name (RSS sums double-count shared pages, so the **count** is the meaningful figure):

```
=== TOP COMMANDS BY PROCESS COUNT ===
    527 MainThread        <- vscode-remote-containers-server-*.js / server-main.js (inside container)
    524 docker            <- `docker exec` keep-alive shells (on host)
     31 code
    ...
total processes on host: 1781
```

The host `docker exec` processes are all keep-alive shells targeting the **same** container, e.g.:

```
docker exec -i -u root <containerId> /bin/sh -c \
  echo "Container already running. Keep-alive process started." ; \
  export VSCODE_REMOTE_CONTAINERS_SESSION=<sessionId> ; /bin/sh

docker exec -i -u <user> -e VSCODE_REMOTE_CONTAINERS_SESSION=<sessionId> <containerId> /bin/sh
```

All ~792 share a single parent: a VS Code utility node process
(`code --type=utility --utility-sub-type=node.mojom.NodeService …`, the Dev Containers
extension host), which itself is a child of the main `code` process. That parent had been
alive only ~19m35s yet had already spawned 792 execs, and was **still spawning new ones**
at the time of capture (oldest survivor ~10m old, newest 0s old).

Inside the container, the leaked processes are VS Code server instances:

```
/home/<user>/.vscode-server/bin/<hash>/node /tmp/vscode-remote-containers-server-<uuid>.js
/vscode/vscode-server/bin/linux-x64/<hash>/node …/out/server-main.js …
cgroup: /system.slice/docker-<containerId>.scope
```

Memory state during vs. after the incident:

```
during:  Mem 21Gi used / 40Gi free  |  Swap 7.5Gi used / 0.5Gi free   (1781 procs)
after:   Mem  8Gi used / 53Gi free  |  Swap 0.8Gi used / 7.2Gi free   ( 619 procs)
```

## Steps to Reproduce (suspected)

1. Open a folder in a Dev Container and keep the connection alive for an extended period (hours).
2. Observe over time the count of `docker exec … "Container already running. Keep-alive process started."` processes on the host (`pgrep -x docker | wc -l`) and `vscode-remote-containers-server` node processes inside the container.
3. The counts grow into the hundreds rather than staying flat; the keep-alive execs are re-spawned but the prior ones are never reaped.

## Expected

Exactly one keep-alive shell (and its corresponding server) should exist per active Dev Containers session; stale ones should be reaped when superseded or when the connection drops.

## Actual

Keep-alive execs and inner server processes accumulate without bound for the lifetime of the connection, eventually exhausting swap and degrading the whole machine.

## Diagnostic one-liners

```bash
# host-side leaked keep-alive execs
pgrep -x docker | wc -l

# inner leaked server processes
pgrep -x MainThread | wc -l

# aggregate RSS + count by command (count is the real signal)
ps -eo rss,comm --no-headers | awk '{a[$2]+=$1;c[$2]++} END{for(k in a) printf "%10.1f MB  x%-6d  %s\n",a[k]/1024,c[k],k}' | sort -rn | head
```

---

## UPDATE — refined root cause: auto-forwarded extension-host agent port storms relays

After reading the Dev Containers extension log
(`~/.config/Code/logs/<session>/window2/exthost/ms-vscode-remote.remote-containers/remoteContainers-*.log`),
the dominant driver is **not** the keep-alive `/bin/sh` shells — it is a **port-forwarding relay storm against VS Code's own server agent port**.

**The forwarded port is the Extension Host Agent itself.** Log at container startup:

```
[..05:33:53.650Z] Server bound to 127.0.0.1:36021 (IPv4)
Extension host agent listening on 36021
[..05:33:53.654Z] Port forwarding for container port 36021 starts listening on local port.
[..05:33:53.655Z] Port forwarding local port 36903 to container port 36021
```

So the extension forwarded the server's **own internal agent port** (container `36021` -> host `36903`). Something on the host then opens connections to `localhost:36903` continuously, and **each connection spawns a fresh `docker exec ... node -e <portforward relay>` inside the container**:

```
[..] Port forwarding connection from <ephemeralPort> > 36903 > 36021 in the container.
[..] Start: Run in container: <user>/.vscode-server/bin/<hash>/node -e
[..] Stop (12x ms): Run in container: ...node -e
[..] Port forwarding <ephemeralPort> > 36903 > 36021 terminated by extension (closed) with code 0
```

**Rate / volume in one 22-minute window** (13:52 -> 14:14):

```
Port forwarding connection from ... : 4,366
Start: Run in container (docker exec): 4,442
sustained peak: 15 new connections/second
```

The connecting source ports span a wide ephemeral range (32770–60994), i.e. **many short-lived fresh TCP connections** — a poll/probe loop, not one persistent client. The relay `docker exec`s and their inner `node` servers accumulate faster than they are reaped, producing ~1,050 simultaneously-live leaked processes.

**Why it presents as "out of memory" but isn't:** physical RAM stayed ~40 GiB free throughout; it is **swap** that filled (7.5 / 8 GiB). Each leaked proc is only ~30–90 MB so none is visible in `top`/Task Manager sorted by memory — only the process *count* (1,781 total, ~1,050 of them this leak) reveals it.

**Self-recovery:** when the window reloaded / the connection dropped, the extension-host parent process exited and the OS reaped all ~1,050 children at once; swap drained 7.5 GiB -> 0.8 GiB, process count 1,781 -> 619, with no manual intervention.

**Likely bug:** the agent's own port should not be auto-forwarded-and-relayed per-connection like a user app port, and/or stale relay execs are not reaped while connections keep arriving. A `pkill` of the execs is ineffective because they respawn as long as connections keep hitting the forwarded port.

### Additional repro signal

```bash
# in the Dev Containers ext log, count relay spawns and the per-second rate:
grep -c "Start: Run in container" <remoteContainers-*.log>
grep "Port forwarding connection from" <log> | grep -oE '^\[[0-9T:-]+' | cut -c1-20 | uniq -c | sort -rn | head
# confirm the forwarded port is the agent port:
grep -nE "Extension host agent listening on|Port forwarding local port .* to container port" <log>
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev Containers: auto-forwarded extension-host agent port storms `docker exec` relays (~15/s), leaking thousands of procs until swap is exhausted #11703

Dev Containers extension host leaks hundreds of `docker exec` keep-alive shells, exhausting swap

Summary

Does this issue occur when all extensions are disabled?

Environment

Evidence captured during the incident

Steps to Reproduce (suspected)

Expected

Actual

Diagnostic one-liners

UPDATE — refined root cause: auto-forwarded extension-host agent port storms relays

Additional repro signal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Dev Containers: auto-forwarded extension-host agent port storms docker exec relays (~15/s), leaking thousands of procs until swap is exhausted #11703

Description

Dev Containers extension host leaks hundreds of docker exec keep-alive shells, exhausting swap

Summary

Does this issue occur when all extensions are disabled?

Environment

Evidence captured during the incident

Steps to Reproduce (suspected)

Expected

Actual

Diagnostic one-liners

UPDATE — refined root cause: auto-forwarded extension-host agent port storms relays

Additional repro signal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Dev Containers: auto-forwarded extension-host agent port storms `docker exec` relays (~15/s), leaking thousands of procs until swap is exhausted #11703

Dev Containers extension host leaks hundreds of `docker exec` keep-alive shells, exhausting swap