Skip to content

fix: resolve ARM token resource from AZURE_AUTHORITY_HOST for sovereign cloud ACR auth#27580

Open
atiasadir wants to merge 1 commit intoargoproj:masterfrom
atiasadir:fix/sovereign-cloud-acr-workload-identity
Open

fix: resolve ARM token resource from AZURE_AUTHORITY_HOST for sovereign cloud ACR auth#27580
atiasadir wants to merge 1 commit intoargoproj:masterfrom
atiasadir:fix/sovereign-cloud-acr-workload-identity

Conversation

@atiasadir
Copy link
Copy Markdown

@atiasadir atiasadir commented Apr 28, 2026

Summary

When using Azure Workload Identity (useAzureWorkloadIdentity: "true") with Helm OCI registries on Azure sovereign clouds, the repo-server fails to pull charts from ACR with 401 Unauthorized.

Root Cause

In util/helm/creds.go, the ARM token scope for the ACR refresh token exchange is hardcoded to the commercial endpoint:

armTokenScope := env.StringFromEnv("AZURE_ARM_TOKEN_RESOURCE", "https://management.core.windows.net")

On sovereign clouds, the ARM resource endpoint differs. The token is obtained with the correct authority (AZURE_AUTHORITY_HOST is properly injected by the WI webhook), but the token audience/scope targets the commercial ARM resource, which the sovereign ACR /oauth2/exchange endpoint rejects.

Fix

Resolve the ARM token resource from AZURE_AUTHORITY_HOST — the only cloud-identifying env var injected by the AKS Workload Identity mutating webhook into workload pods. (AZURE_ENVIRONMENT is NOT injected into pods, only into the webhook controller itself.)

Ref: https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html

Cloud Authority Host ARM Token Resource
Public (Commercial) login.microsoftonline.com management.core.windows.net
US Government (FairFax/GCC) login.microsoftonline.us management.core.usgovcloudapi.net
China (Mooncake) login.partner.microsoftonline.cn management.core.chinacloudapi.cn

For other cloud environments, operators can set AZURE_ARM_TOKEN_RESOURCE env var explicitly on the repo-server.

Testing

  • Validated on a production Azure Government (FairFax) AKS cluster running ArgoCD v3.3.3
  • Before: 401 Unauthorized on every GenerateManifest call for Helm OCI charts
  • After: All charts pull successfully from Gov ACR
  • Commercial clusters unaffected (default value unchanged)

Checklist

  • Either (a) I have created an Enhancement Proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the EP process.
  • The title of the PR states what changed and the related issues number (used for the curved changelog).
  • I have signed off all my commits with DCO

@atiasadir atiasadir requested a review from a team as a code owner April 28, 2026 20:39
@bunnyshell
Copy link
Copy Markdown

bunnyshell Bot commented Apr 28, 2026

🔴 Preview Environment stopped on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔵 /bns:start to start the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

@atiasadir atiasadir force-pushed the fix/sovereign-cloud-acr-workload-identity branch 4 times, most recently from 926c4fb to aab6757 Compare April 28, 2026 21:36
@atiasadir atiasadir changed the title fix: resolve ARM token resource from Azure environment for sovereign cloud ACR auth fix: resolve ARM token resource from AZURE_AUTHORITY_HOST for sovereign cloud ACR auth Apr 28, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.64%. Comparing base (42498e6) to head (a946ce3).

Files with missing lines Patch % Lines
util/helm/creds.go 50.00% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #27580   +/-   ##
=======================================
  Coverage   63.63%   63.64%           
=======================================
  Files         417      417           
  Lines       57079    57090   +11     
=======================================
+ Hits        36322    36333   +11     
+ Misses      17361    17360    -1     
- Partials     3396     3397    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@atiasadir atiasadir force-pushed the fix/sovereign-cloud-acr-workload-identity branch from aab6757 to d4a4e20 Compare April 29, 2026 03:55
…gn cloud ACR auth

ArgoCD hardcodes the ARM token scope to the commercial endpoint
(https://management.core.windows.net) when exchanging a Workload Identity
access token for an ACR refresh token. On sovereign clouds, this causes
401 Unauthorized because the token audience does not match the cloud ARM
resource.

This change resolves the ARM token resource from AZURE_AUTHORITY_HOST,
which is the only cloud-identifying env var injected by the AKS Workload
Identity mutating webhook into workload pods.

Supported clouds (auto-resolved):
- Azure Public (login.microsoftonline.com) -> management.core.windows.net
- Azure US Government (login.microsoftonline.us) -> management.core.usgovcloudapi.net
- Azure China (login.partner.microsoftonline.cn) -> management.core.chinacloudapi.cn

For other cloud environments, set AZURE_ARM_TOKEN_RESOURCE explicitly.

Signed-off-by: Adir Atias <adatias@microsoft.com>
@atiasadir atiasadir force-pushed the fix/sovereign-cloud-acr-workload-identity branch from d4a4e20 to a946ce3 Compare April 30, 2026 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants