feat(anc): wire check-hotfix into node wrapper behind ENABLE_PROVISIONING_HOTFIX#8715
Conversation
ede050a to
0c90761
Compare
…2.1b)
Add a fail-open 'check-hotfix' CLI subcommand that reads the
kube-system/anc-hotfix-version ConfigMap published by the
live-patching-controller and stages the resolved {hotfixes:{...}} pointer
to the path download-hotfix already reads. download-hotfix keeps its
unchanged patch-only, strictly-higher gating; check-hotfix only fetches and
writes the pointer.
- Raw net/http HTTPS GET (no client-go); creds from AKSNodeConfig bootstrap
token + apiserver FQDN (primary) or on-node kubeconfigs (secondary).
- Shares the 2.1a hotfixConfig parser/data contract with download-hotfix.
- Always exits 0; emits CheckHotfix telemetry (configMapRead,
noHotfixForBase, customDataFallback, failed).
- PoC cold-start fallback reads a lenient top-level hotfixes object from the
node config when the ConfigMap read fails (TODO: typed absvc contract).
- Injectable App fields (checkHotfixConfigMapFetcher, nodeConfigPath) for
network-free unit tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0c90761 to
b33ec66
Compare
Add a default-off ANC_HOTFIX_ENABLED-gated call to the 2.1b check-hotfix subcommand in aks-node-controller-wrapper.sh, placed before the existing download-hotfix block since check-hotfix refreshes the hotfix pointer that block consumes. The call is fail-open and wrapped defensively so it can never block provisioning. When the flag is unset/non-true the wrapper behaves exactly as before (6-month VHD backward compat). Parameterize HOTFIX_JSON to match the existing path-var pattern and enable shellspec coverage of the download-hotfix branch. Add shellspec tests for flag off, flag on ordering, fail-open, and non-true value handling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify that the check-hotfix non-zero (fail-open) case also models a node whose VHD-baked binary predates 2.1b, where check-hotfix is an unknown subcommand. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Match the design's EnableProvisioningHotfix aks-rp region toggle and AgentBaker's contract->env naming convention (EnableIMDSRestriction -> ENABLE_IMDS_RESTRICTION), so the toggle -> absvc -> ANC opt-in chain stays traceable. No behavior change; still default-off and fail-open. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f842590 to
3ebabf0
Compare
The hotfix pointer read channel moved from the kube-system ConfigMap (apiserver + bootstrap token) to the LPS endpoint (IMDS-attested); the fetch/auth rewrite lives in 2.1b. The wrapper's check-hotfix -> download-hotfix call contract, the ENABLE_PROVISIONING_HOTFIX gate, and the fail-open semantics are unchanged - only the explanatory comment is updated to name the new read channel accurately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Read-channel pivot: the hotfix-pointer read moves from Option 2 (kube-system anc-hotfix-version ConfigMap via apiserver + bootstrap token) to Option 4 (LPS endpoint, IMDS-attested), validated by e2e showing the node can reach LPS pre-kubelet. The fetch/auth rewrite lives in #8696 (2.1b). This wrapper wiring is channel-agnostic: the check-hotfix -> download-hotfix call sequence, the ENABLE_PROVISIONING_HOTFIX gate (relaxed by 2.1d via the enable_provisioning_hotfix contract field), and the fail-open semantics are all unchanged. Only comments/wording were updated to name the new read channel. |
Changes cached containers or packages on windows VHDsPlease get a Windows SIG member to approve. The following dif file shows any additions or deletions from what will be cached on windows VHDs organised by VHD type.
diff --git a/vhd_files/2022-containerd-gen2.txt b/vhd_files/2022-containerd-gen2.txt
index 7039bac..c51a47f 100644
--- a/vhd_files/2022-containerd-gen2.txt
+++ b/vhd_files/2022-containerd-gen2.txt
@@ -122,0 +123 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -124 +124,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp
diff --git a/vhd_files/2022-containerd.txt b/vhd_files/2022-containerd.txt
index 5915cf1..7312c49 100644
--- a/vhd_files/2022-containerd.txt
+++ b/vhd_files/2022-containerd.txt
@@ -122,0 +123 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -124 +124,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp
diff --git a/vhd_files/2025-gen2.txt b/vhd_files/2025-gen2.txt
index 37d9326..36e3641 100644
--- a/vhd_files/2025-gen2.txt
+++ b/vhd_files/2025-gen2.txt
@@ -52,0 +53 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -54 +54,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp
diff --git a/vhd_files/2025.txt b/vhd_files/2025.txt
index 5b08280..b8873d5 100644
--- a/vhd_files/2025.txt
+++ b/vhd_files/2025.txt
@@ -52,0 +53 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -54 +54,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp |
07b497b to
0d6f945
Compare
2.1c - Wire check-hotfix into the node wrapper (shell only)
POC / M1 draft. Shell-only wiring for the Provisioning-Hotfix flow. No Go changes.
Enablement (where this sits in the rollout chain)
This env gate is the on-node terminal of the design's region-staged opt-in:
EnableProvisioningHotfixaks-rp toggle (AKS Toggles-as-code, per region) -> absvcrespects toggle -> ANC respects toggle. This PR implements only the last hop ("ANC
respects toggle"). The env var name mirrors the toggle/contract name to match the
existing contract->env convention (e.g. EnableIMDSRestriction -> ENABLE_IMDS_RESTRICTION),
keeping the chain traceable. Wiring absvc to render this var from a contract field is a
separate follow-up PR; the aks-rp toggle + toggle YAML live in the aks-rp repo. Until
those land, the var renders unset everywhere, so this change is inert (default-off).
Note: 2.1d (#8717) relaxes this env gate, moving the on/off decision into the Go binary
via the
enable_provisioning_hotfixcontract field (single source of truth). This PRintentionally ADDS the gate; #8717 relaxes it, so each PR stays reviewable on its own.
What this does
Adds one call to the
check-hotfixsubcommand (added in 2.1b) insideaks-node-controller-wrapper.sh, gated behind a new env flagENABLE_PROVISIONING_HOTFIXthat is OFF by default.
check-hotfixreads the hotfix pointer from the LPS endpoint(IMDS-attested) and refreshes
$HOTFIX_JSON, which the existingdownload-hotfixblock consumes - so it must runfirst. The call is fail-open (the command always exits 0) and additionally wrapped
defensively so it can never block provisioning.
Default-off / fail-open guarantee
When
ENABLE_PROVISIONING_HOTFIXis unset, empty, or any value other than the literalstring
true, the wrapper behaves EXACTLY as it does today. This preserves the6-month VHD backward-compatibility window: older VHDs running newer CSE, and newer
VHDs running older CSE, are unaffected unless the flag is explicitly turned on.
Known-safe: old VHD + flag on
If
ENABLE_PROVISIONING_HOTFIX=trueever reaches a node whose VHD-baked ANC binary predates2.1b,
"$BIN_PATH" check-hotfixis an unknown subcommand and exits non-zero. Theif ... else log "...continuing (fail-open)" fiwrapper swallows that error, soprovisioning still proceeds unchanged. This path is covered by shellspec case 4 below
(check-hotfix exits non-zero -> wrapper still provisions), which models the missing
subcommand. This matters for the 6-month VHD support window.
Before / after flow
Flag off (default - unchanged):
Flag on (
ENABLE_PROVISIONING_HOTFIX=true):Notes
check-hotfixtakes no flags/args; it reads the AKSNodeConfig from its defaulton-node path internally for the LPS endpoint (IMDS-attested) it reads, so the wrapper passes nothing.
HOTFIX_JSONis parameterized as${HOTFIX_JSON:-<default>}to match the existingBIN_PATH/CONFIG_PATH/NBC_CMD_PATHpattern and to allow shellspec to exercisethe download-hotfix branch. Production default path is unchanged.
defaultHotfixVersionPath(
/opt/azure/containers/aks-node-controller-hotfix.json, hotfix.go) and download-hotfixreads the same constant. The wrapper's
HOTFIX_JSONdefault is byte-identical, and theGo
hotfixVersionPathoverride exists only for tests (no env/production override andcheck-hotfix takes no path flag), so the two never diverge on a node.
[ ],=,${VAR:-}); passes shellcheck generic + POSIX (SC3010/SC3014)and the wrapper shellspec suite (8 examples, 0 failures).
Tests
New shellspec cases in
aks_node_controller_wrapper_spec.sh:Stack
Base is set to the 2.1b branch so the diff shows only the wrapper + shellspec changes.
Will retarget to main as the stack merges down.
This unblocks the on-node e2e PoC tests (fail-open and multi-base) since check-hotfix
is otherwise never invoked at boot.