Dev Containers extension host leaks hundreds of docker exec keep-alive shells, exhausting swap
Summary
A long-running Dev Containers connection leaks docker exec "keep-alive" shell processes on the host without ever reaping them. Over ~20 minutes the extension-host utility process spawned 792 such docker exec invocations, leaving ~525 live on the host simultaneously, with a matching ~528 vscode-remote-containers-server-*.js / server-main.js node processes piling up inside the target container.
None of these are individually large, so they are invisible in top/Task Manager sorted by RSS. In aggregate (~1,050 leaked processes out of 1,781 total on the host) they exhausted swap — 7.5 GiB of 8 GiB used — and made the machine unresponsive, even though physical RAM was not full (40 GiB free at the time).
The leak resolved itself when the connection dropped / the VS Code window reloaded: process count fell from 1,781 → 619 and swap drained from 7.5 GiB → 0.7 GiB with no other action.
Does this issue occur when all extensions are disabled?
Not yet tested — the leak is intermittent and tied to a long-lived container session, which makes a clean-profile repro slow to trigger. Will update if reproduced.
Environment
- VS Code Version: 1.124.2 (commit
6928394f91b684055b873eecb8bc281365131f1c, x64)
- Dev Containers extension: ms-vscode-remote.remote-containers 0.459.1
- Local OS: Ubuntu 24.04.4 LTS, kernel 6.17.0-35-generic
- Remote / connection type: Dev Containers (Containers)
- Docker: 29.5.3 (build d1c06ef)
- Container image:
vsc-prometheus-… (a workspace dev container)
Evidence captured during the incident
Process aggregation by command name (RSS sums double-count shared pages, so the count is the meaningful figure):
=== TOP COMMANDS BY PROCESS COUNT ===
527 MainThread <- vscode-remote-containers-server-*.js / server-main.js (inside container)
524 docker <- `docker exec` keep-alive shells (on host)
31 code
...
total processes on host: 1781
The host docker exec processes are all keep-alive shells targeting the same container, e.g.:
docker exec -i -u root <containerId> /bin/sh -c \
echo "Container already running. Keep-alive process started." ; \
export VSCODE_REMOTE_CONTAINERS_SESSION=<sessionId> ; /bin/sh
docker exec -i -u <user> -e VSCODE_REMOTE_CONTAINERS_SESSION=<sessionId> <containerId> /bin/sh
All ~792 share a single parent: a VS Code utility node process
(code --type=utility --utility-sub-type=node.mojom.NodeService …, the Dev Containers
extension host), which itself is a child of the main code process. That parent had been
alive only ~19m35s yet had already spawned 792 execs, and was still spawning new ones
at the time of capture (oldest survivor ~10m old, newest 0s old).
Inside the container, the leaked processes are VS Code server instances:
/home/<user>/.vscode-server/bin/<hash>/node /tmp/vscode-remote-containers-server-<uuid>.js
/vscode/vscode-server/bin/linux-x64/<hash>/node …/out/server-main.js …
cgroup: /system.slice/docker-<containerId>.scope
Memory state during vs. after the incident:
during: Mem 21Gi used / 40Gi free | Swap 7.5Gi used / 0.5Gi free (1781 procs)
after: Mem 8Gi used / 53Gi free | Swap 0.8Gi used / 7.2Gi free ( 619 procs)
Steps to Reproduce (suspected)
- Open a folder in a Dev Container and keep the connection alive for an extended period (hours).
- Observe over time the count of
docker exec … "Container already running. Keep-alive process started." processes on the host (pgrep -x docker | wc -l) and vscode-remote-containers-server node processes inside the container.
- The counts grow into the hundreds rather than staying flat; the keep-alive execs are re-spawned but the prior ones are never reaped.
Expected
Exactly one keep-alive shell (and its corresponding server) should exist per active Dev Containers session; stale ones should be reaped when superseded or when the connection drops.
Actual
Keep-alive execs and inner server processes accumulate without bound for the lifetime of the connection, eventually exhausting swap and degrading the whole machine.
Diagnostic one-liners
# host-side leaked keep-alive execs
pgrep -x docker | wc -l
# inner leaked server processes
pgrep -x MainThread | wc -l
# aggregate RSS + count by command (count is the real signal)
ps -eo rss,comm --no-headers | awk '{a[$2]+=$1;c[$2]++} END{for(k in a) printf "%10.1f MB x%-6d %s\n",a[k]/1024,c[k],k}' | sort -rn | head
UPDATE — refined root cause: auto-forwarded extension-host agent port storms relays
After reading the Dev Containers extension log
(~/.config/Code/logs/<session>/window2/exthost/ms-vscode-remote.remote-containers/remoteContainers-*.log),
the dominant driver is not the keep-alive /bin/sh shells — it is a port-forwarding relay storm against VS Code's own server agent port.
The forwarded port is the Extension Host Agent itself. Log at container startup:
[..05:33:53.650Z] Server bound to 127.0.0.1:36021 (IPv4)
Extension host agent listening on 36021
[..05:33:53.654Z] Port forwarding for container port 36021 starts listening on local port.
[..05:33:53.655Z] Port forwarding local port 36903 to container port 36021
So the extension forwarded the server's own internal agent port (container 36021 -> host 36903). Something on the host then opens connections to localhost:36903 continuously, and each connection spawns a fresh docker exec ... node -e <portforward relay> inside the container:
[..] Port forwarding connection from <ephemeralPort> > 36903 > 36021 in the container.
[..] Start: Run in container: <user>/.vscode-server/bin/<hash>/node -e
[..] Stop (12x ms): Run in container: ...node -e
[..] Port forwarding <ephemeralPort> > 36903 > 36021 terminated by extension (closed) with code 0
Rate / volume in one 22-minute window (13:52 -> 14:14):
Port forwarding connection from ... : 4,366
Start: Run in container (docker exec): 4,442
sustained peak: 15 new connections/second
The connecting source ports span a wide ephemeral range (32770–60994), i.e. many short-lived fresh TCP connections — a poll/probe loop, not one persistent client. The relay docker execs and their inner node servers accumulate faster than they are reaped, producing ~1,050 simultaneously-live leaked processes.
Why it presents as "out of memory" but isn't: physical RAM stayed ~40 GiB free throughout; it is swap that filled (7.5 / 8 GiB). Each leaked proc is only ~30–90 MB so none is visible in top/Task Manager sorted by memory — only the process count (1,781 total, ~1,050 of them this leak) reveals it.
Self-recovery: when the window reloaded / the connection dropped, the extension-host parent process exited and the OS reaped all ~1,050 children at once; swap drained 7.5 GiB -> 0.8 GiB, process count 1,781 -> 619, with no manual intervention.
Likely bug: the agent's own port should not be auto-forwarded-and-relayed per-connection like a user app port, and/or stale relay execs are not reaped while connections keep arriving. A pkill of the execs is ineffective because they respawn as long as connections keep hitting the forwarded port.
Additional repro signal
# in the Dev Containers ext log, count relay spawns and the per-second rate:
grep -c "Start: Run in container" <remoteContainers-*.log>
grep "Port forwarding connection from" <log> | grep -oE '^\[[0-9T:-]+' | cut -c1-20 | uniq -c | sort -rn | head
# confirm the forwarded port is the agent port:
grep -nE "Extension host agent listening on|Port forwarding local port .* to container port" <log>
Dev Containers extension host leaks hundreds of
docker execkeep-alive shells, exhausting swapSummary
A long-running Dev Containers connection leaks
docker exec"keep-alive" shell processes on the host without ever reaping them. Over ~20 minutes the extension-host utility process spawned 792 suchdocker execinvocations, leaving ~525 live on the host simultaneously, with a matching ~528vscode-remote-containers-server-*.js/server-main.jsnode processes piling up inside the target container.None of these are individually large, so they are invisible in
top/Task Manager sorted by RSS. In aggregate (~1,050 leaked processes out of 1,781 total on the host) they exhausted swap — 7.5 GiB of 8 GiB used — and made the machine unresponsive, even though physical RAM was not full (40 GiB free at the time).The leak resolved itself when the connection dropped / the VS Code window reloaded: process count fell from 1,781 → 619 and swap drained from 7.5 GiB → 0.7 GiB with no other action.
Does this issue occur when all extensions are disabled?
Not yet tested — the leak is intermittent and tied to a long-lived container session, which makes a clean-profile repro slow to trigger. Will update if reproduced.
Environment
6928394f91b684055b873eecb8bc281365131f1c, x64)vsc-prometheus-…(a workspace dev container)Evidence captured during the incident
Process aggregation by command name (RSS sums double-count shared pages, so the count is the meaningful figure):
The host
docker execprocesses are all keep-alive shells targeting the same container, e.g.:All ~792 share a single parent: a VS Code utility node process
(
code --type=utility --utility-sub-type=node.mojom.NodeService …, the Dev Containersextension host), which itself is a child of the main
codeprocess. That parent had beenalive only ~19m35s yet had already spawned 792 execs, and was still spawning new ones
at the time of capture (oldest survivor ~10m old, newest 0s old).
Inside the container, the leaked processes are VS Code server instances:
Memory state during vs. after the incident:
Steps to Reproduce (suspected)
docker exec … "Container already running. Keep-alive process started."processes on the host (pgrep -x docker | wc -l) andvscode-remote-containers-servernode processes inside the container.Expected
Exactly one keep-alive shell (and its corresponding server) should exist per active Dev Containers session; stale ones should be reaped when superseded or when the connection drops.
Actual
Keep-alive execs and inner server processes accumulate without bound for the lifetime of the connection, eventually exhausting swap and degrading the whole machine.
Diagnostic one-liners
UPDATE — refined root cause: auto-forwarded extension-host agent port storms relays
After reading the Dev Containers extension log
(
~/.config/Code/logs/<session>/window2/exthost/ms-vscode-remote.remote-containers/remoteContainers-*.log),the dominant driver is not the keep-alive
/bin/shshells — it is a port-forwarding relay storm against VS Code's own server agent port.The forwarded port is the Extension Host Agent itself. Log at container startup:
So the extension forwarded the server's own internal agent port (container
36021-> host36903). Something on the host then opens connections tolocalhost:36903continuously, and each connection spawns a freshdocker exec ... node -e <portforward relay>inside the container:Rate / volume in one 22-minute window (13:52 -> 14:14):
The connecting source ports span a wide ephemeral range (32770–60994), i.e. many short-lived fresh TCP connections — a poll/probe loop, not one persistent client. The relay
docker execs and their innernodeservers accumulate faster than they are reaped, producing ~1,050 simultaneously-live leaked processes.Why it presents as "out of memory" but isn't: physical RAM stayed ~40 GiB free throughout; it is swap that filled (7.5 / 8 GiB). Each leaked proc is only ~30–90 MB so none is visible in
top/Task Manager sorted by memory — only the process count (1,781 total, ~1,050 of them this leak) reveals it.Self-recovery: when the window reloaded / the connection dropped, the extension-host parent process exited and the OS reaped all ~1,050 children at once; swap drained 7.5 GiB -> 0.8 GiB, process count 1,781 -> 619, with no manual intervention.
Likely bug: the agent's own port should not be auto-forwarded-and-relayed per-connection like a user app port, and/or stale relay execs are not reaped while connections keep arriving. A
pkillof the execs is ineffective because they respawn as long as connections keep hitting the forwarded port.Additional repro signal