Skip to content

[Proposal] Add token acquisition metrics that cover both successful and failed t…#5928

Draft
ssmelov wants to merge 4 commits into
AzureAD:mainfrom
ssmelov:feature/add_extended_duration_metrics_that_include_success_and_failures
Draft

[Proposal] Add token acquisition metrics that cover both successful and failed t…#5928
ssmelov wants to merge 4 commits into
AzureAD:mainfrom
ssmelov:feature/add_extended_duration_metrics_that_include_success_and_failures

Conversation

@ssmelov

@ssmelov ssmelov commented Apr 15, 2026

Copy link
Copy Markdown
Collaborator

Add token acquisition metrics that cover both successful and failed token acquisition outcomes and close an observability gap by recording IDP network latency for background refresh.

Copilot AI review requested due to automatic review settings April 15, 2026 15:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an opt-in “extended token acquisition metrics” mode to improve OpenTelemetry coverage across both successful and failed token acquisitions, and closes a gap by capturing IDP network latency for background refresh successes.

Changes:

  • Introduces V2 OTel histograms (total duration + HTTP duration) tagged with Succeeded / HttpStatusCode, gated by a new config flag.
  • Adds a public builder opt-in WithExtendedTokenAcquisitionMetrics() and threads the flag through OTel instrumentation APIs.
  • Logs HTTP duration for proactive background refresh successes.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/client/Microsoft.Identity.Client/TelemetryCore/TelemetryConstants.cs Adds a new telemetry tag constant for HTTP status code.
src/client/Microsoft.Identity.Client/TelemetryCore/OpenTelemetry/NullOtelInstrumentation.cs Updates no-op instrumentation to match new extended-metrics API surface.
src/client/Microsoft.Identity.Client/TelemetryCore/OpenTelemetry/IOtelInstrumentation.cs Extends the instrumentation interface for V2 metrics + HTTP duration logging and failure context.
src/client/Microsoft.Identity.Client/Platforms/Features/OpenTelemetry/OtelInstrumentation.cs Implements V2 histograms, extended/fallback emission logic, and updated failure metric recording.
src/client/Microsoft.Identity.Client/Internal/Requests/SilentRequestHelper.cs Emits success HTTP duration for proactive background refresh; updates failure logging signature.
src/client/Microsoft.Identity.Client/Internal/Requests/RequestBase.cs Captures total duration for failures and passes HTTP status code into failure telemetry.
src/client/Microsoft.Identity.Client/AppConfig/BaseAbstractApplicationBuilder.cs Adds the public opt-in builder method for extended token acquisition metrics.
src/client/Microsoft.Identity.Client/AppConfig/ApplicationConfiguration.cs Adds configuration toggle IsExtendedTokenAcquisitionMetricsEnabled.
src/client/Microsoft.Identity.Client/PublicApi/*/PublicAPI.Unshipped.txt Declares the new public builder API across TFMs.

Comment on lines +110 to +111
int httpStatusCode = ex is MsalServiceException serviceEx ? serviceEx.StatusCode : 0;
LogFailureTelemetryToOtel(apiEvent, apiEvent.ApiErrorCode, httpStatusCode, requestStopwatch.ElapsedMilliseconds + measureTelemetryDurationResult.Milliseconds);

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

measureTelemetryDurationResult.Milliseconds is the millisecond component (0–999) of the TimeSpan, not the full duration. This will under-report total duration for telemetry whenever measureTelemetryDurationResult is >= 1 second. Use measureTelemetryDurationResult.TotalMilliseconds (cast/rounded to long as needed) instead of .Milliseconds when computing totalDurationInMs.

Copilot uses AI. Check for mistakes.
new(TelemetryConstants.Platform, platform),
new(TelemetryConstants.ErrorCode, errorCode),
new(TelemetryConstants.ApiId, apiEvent.ApiId),
new(TelemetryConstants.CallerSdkId, apiEvent.CallerSdkApiId ?? string.Empty + "," + apiEvent.CallerSdkVersion ?? string.Empty),

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CallerSdkId value composition is incorrect due to ?? / + precedence and will also drop the version when CallerSdkApiId is non-null. Build the combined value explicitly (e.g., via interpolation or parentheses) so it always includes both parts with empty-string defaults.

Suggested change
new(TelemetryConstants.CallerSdkId, apiEvent.CallerSdkApiId ?? string.Empty + "," + apiEvent.CallerSdkVersion ?? string.Empty),
new(TelemetryConstants.CallerSdkId, $"{apiEvent.CallerSdkApiId ?? string.Empty},{apiEvent.CallerSdkVersion ?? string.Empty}"),

Copilot uses AI. Check for mistakes.
Comment on lines +167 to +173
s_durationInExtensionInMs.Value.Record(authResultMetadata.DurationCreatingExtendedTokenInUs,
new(TelemetryConstants.MsalVersion, MsalIdHelper.GetMsalVersion()),
new(TelemetryConstants.Platform, platform),
new(TelemetryConstants.ApiId, apiId),
new(TelemetryConstants.TokenSource, authResultMetadata.TokenSource),
new(TelemetryConstants.CacheLevel, authResultMetadata.CacheLevel),
new(TelemetryConstants.TokenType, authResultMetadata.TelemetryTokenType));

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_durationInExtensionInMs is defined as an “InMs” histogram (unit: "ms"), but the recorded value is DurationCreatingExtendedTokenInUs (microseconds). Convert to milliseconds before recording (or record into a microseconds-based histogram) to avoid emitting incorrect latency values.

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +131
serviceBundle.PlatformProxy.OtelInstrumentation.LogFailureMetrics(
serviceBundle.PlatformProxy.GetProductName(),
apiEvent,
ex.ErrorCode,
apiEvent.ApiId,
callerSdkId,
callerSdkVersion,
CacheRefreshReason.ProactivelyRefreshed,
apiEvent.TokenType);
ex.StatusCode,
totalDurationInMs: 0,
serviceBundle.Config.IsExtendedTokenAcquisitionMetricsEnabled);

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In extended-metrics mode, OtelInstrumentation.LogFailureMetrics only records the V2 total-duration histogram when totalDurationInMs > 0. Passing totalDurationInMs: 0 here means background refresh failures won’t contribute to the new “cover failures” total-duration metric. If the goal is to cover failed outcomes for proactive refresh too, capture and pass a meaningful elapsed duration in this flow (or adjust the recording condition to allow 0 only if that’s an intentional signal).

Copilot uses AI. Check for mistakes.
Comment on lines +260 to +282
if (!isExtendedMetricsEnabled)
{
if (s_durationInHttp.Value.Enabled)
{
s_durationInHttp.Value.Record(authResultMetadata.DurationInHttpInMs,
new(TelemetryConstants.MsalVersion, MsalIdHelper.GetMsalVersion()),
new(TelemetryConstants.Platform, platform),
new(TelemetryConstants.ApiId, apiId),
new(TelemetryConstants.TokenType, authResultMetadata.TelemetryTokenType));
}
}
else
{
if (s_durationInHttpV2.Value.Enabled)
{
s_durationInHttpV2.Value.Record(authResultMetadata.DurationInHttpInMs,
new(TelemetryConstants.MsalVersion, MsalIdHelper.GetMsalVersion()),
new(TelemetryConstants.Platform, platform),
new(TelemetryConstants.ApiId, apiId),
new(TelemetryConstants.TokenType, authResultMetadata.TelemetryTokenType),
new(TelemetryConstants.HttpStatusCode, 200));
}
}

Copilot AI Apr 15, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extended/non-extended branching and tag sets for HTTP duration are now duplicated between LogSuccessMetrics and LogSuccessHttpDuration. Consider centralizing the “record HTTP duration” logic into a single private helper to reduce the risk of the two paths drifting (e.g., tag changes, enablement checks, or future histogram revisions).

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 16, 2026 18:10

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.

Comment on lines 68 to 74
if (isBrokerConfigured && !isAccountSourceDeviceCodeFlow)
{
_logger.Info("Broker is configured and enabled, attempting to use broker instead.");
AuthenticationRequestParameters.RequestContext.ApiEvent.TokenSource = TokenSource.Broker;
var brokerResult = await _brokerStrategyLazy.Value.ExecuteAsync(cancellationToken).ConfigureAwait(false);

// fallback to local cache if broker fails

Copilot AI Apr 16, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApiEvent.TokenSource is set to Broker before attempting the broker flow, but it is never reset when the broker returns null and the code falls back to the local cache. This can mis-tag failure telemetry (and any other ApiEvent-based telemetry) as coming from the broker even when the cache path was used. Set TokenSource back to Cache before _clientStrategy.ExecuteAsync(...) or right before the "Attempting to acquire token using local cache" path.

Copilot uses AI. Check for mistakes.
Comment on lines +726 to +740
case "MsalTotalDuration.2":
Trace.WriteLine("Verify the metrics captured for MsalTotalDuration.2 histogram.");
Assert.AreEqual(MetricType.Histogram, exportedItem.MetricType);

expectedTags.Add(TelemetryConstants.MsalVersion);
expectedTags.Add(TelemetryConstants.Platform);
expectedTags.Add(TelemetryConstants.ApiId);
expectedTags.Add(TelemetryConstants.CacheRefreshReason);
expectedTags.Add(TelemetryConstants.TokenType);
expectedTags.Add(TelemetryConstants.Succeeded);

foreach (var metricPoint in exportedItem.GetMetricPoints())
{
AssertTags(metricPoint.Tags, expectedTags);
}

Copilot AI Apr 16, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VerifyMetrics expects MsalTotalDuration.2 to have only a small set of tags, but the implementation currently records additional tags (e.g., TokenSource/CacheLevel/ErrorCode). With the current AssertTags (exact tag count), this case will fail as soon as a test starts validating MsalTotalDuration.2. Align the expected tag list here with the actual emitted tags (or adjust the production tag set to match the intended contract).

Copilot uses AI. Check for mistakes.
new(TelemetryConstants.Platform, platform),
new(TelemetryConstants.ErrorCode, errorCode),
new(TelemetryConstants.ApiId, apiEvent.ApiId),
new(TelemetryConstants.CallerSdkId, apiEvent.CallerSdkApiId ?? string.Empty + "," + apiEvent.CallerSdkVersion ?? string.Empty),

Copilot AI Apr 16, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CallerSdkId tag value construction uses ?? and string concatenation without parentheses, so it won’t reliably include both caller SDK id + version and can yield unexpected values (e.g., "," when id is null). Use an explicit format (e.g., interpolate both null-coalesced pieces) to avoid operator-precedence bugs and keep the tag value consistent.

Suggested change
new(TelemetryConstants.CallerSdkId, apiEvent.CallerSdkApiId ?? string.Empty + "," + apiEvent.CallerSdkVersion ?? string.Empty),
new(TelemetryConstants.CallerSdkId, $"{apiEvent.CallerSdkApiId ?? string.Empty},{apiEvent.CallerSdkVersion ?? string.Empty}"),

Copilot uses AI. Check for mistakes.
new(TelemetryConstants.Platform, platform),
new(TelemetryConstants.ApiId, apiEvent.ApiId),
new(TelemetryConstants.TokenSource, apiEvent.TokenSource),
new(TelemetryConstants.CacheLevel, string.Empty),

Copilot AI Apr 16, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MsalTotalDuration.2 records the CacheLevel tag as string.Empty on failures. This makes the attribute type inconsistent with the success path (which uses the CacheLevel enum) and can split/invalidly aggregate metric streams in OTel exporters. Record CacheLevel.None (or another CacheLevel enum value) instead of a string.

Suggested change
new(TelemetryConstants.CacheLevel, string.Empty),
new(TelemetryConstants.CacheLevel, CacheLevel.None),

Copilot uses AI. Check for mistakes.
Comment on lines +209 to +213
new(TelemetryConstants.TokenSource, authResultMetadata.TokenSource),
new(TelemetryConstants.CacheLevel, cacheLevel),
new(TelemetryConstants.CacheRefreshReason, authResultMetadata.CacheRefreshReason),
new(TelemetryConstants.TokenType, authResultMetadata.TelemetryTokenType),
new(TelemetryConstants.ErrorCode, string.Empty),

Copilot AI Apr 16, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extended MsalTotalDuration.2 histogram is documented/tested as being tagged mainly with Succeeded/CacheRefreshReason/TokenType, but the implementation also adds TokenSource, CacheLevel, and ErrorCode. If this is intentional, update the public XML docs/tests; if not, consider removing extra dimensions (especially ErrorCode) to avoid high-cardinality histograms and reduce telemetry cost.

Suggested change
new(TelemetryConstants.TokenSource, authResultMetadata.TokenSource),
new(TelemetryConstants.CacheLevel, cacheLevel),
new(TelemetryConstants.CacheRefreshReason, authResultMetadata.CacheRefreshReason),
new(TelemetryConstants.TokenType, authResultMetadata.TelemetryTokenType),
new(TelemetryConstants.ErrorCode, string.Empty),
new(TelemetryConstants.CacheRefreshReason, authResultMetadata.CacheRefreshReason),
new(TelemetryConstants.TokenType, authResultMetadata.TelemetryTokenType),

Copilot uses AI. Check for mistakes.
}
catch
{
LastRequestDurationInMs = sw.ElapsedMilliseconds;

@neha-bhargava neha-bhargava May 6, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finally block might be a good place to place the duration here that will cover all the cases. Here and in the RequestBase.cs as well

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep it scoped to the network‑bound portion — putting it in finally would also include response body read.

Copilot AI review requested due to automatic review settings May 13, 2026 14:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Comment on lines +107 to +110
serviceBundle.PlatformProxy.OtelInstrumentation.LogSuccessHttpDuration(
serviceBundle.PlatformProxy.GetProductName(),
apiEvent.ApiId,
authResult.AuthenticationResultMetadata);
Comment on lines +173 to +175
if (requestContext.ApiEvent != null)
{
requestContext.ApiEvent.DurationInHttpInMs += _httpManager.LastRequestDurationInMs;

if (_isExtendedMetricsEnabled)
{
if (totalDurationInMs > 0 && s_durationTotalV2.Value.Enabled)
new(TelemetryConstants.Platform, platform),
new(TelemetryConstants.ApiId, apiEvent.ApiId),
new(TelemetryConstants.TokenSource, apiEvent.TokenSource),
new(TelemetryConstants.CacheLevel, string.Empty),
Comment on lines +432 to +435
CollectionAssert.Contains(
_exportedMetrics.Select(m => m.Name).ToList(),
"MsalDurationInHttp.1A",
"Background refresh HTTP duration must be recorded in MsalDurationInHttp.1A");
Comment on lines +736 to +739
expectedTags.Add(TelemetryConstants.ApiId);
expectedTags.Add(TelemetryConstants.CacheRefreshReason);
expectedTags.Add(TelemetryConstants.TokenType);
expectedTags.Add(TelemetryConstants.Succeeded);
}

bool isTimeout = string.Equals(errorCode, MsalError.RequestTimeout, StringComparison.OrdinalIgnoreCase);
if ((httpStatusCode > 0 || (isTimeout && apiEvent.DurationInHttpInMs > 0)) && s_durationInHttpV2.Value.Enabled)
Comment on lines +361 to +367
// 2. Acquire token — failure
_harness.HttpManager.AddTokenResponse(TokenResponseType.InvalidClient);
await AssertException.TaskThrowsAsync<MsalServiceException>(
() => cca.AcquireTokenForClient(TestConstants.s_scopeForAnotherResource)
.WithExtraQueryParameters(extraQueryParams)
.WithTenantId(TestConstants.Utid)
.ExecuteAsync()).ConfigureAwait(false);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants