Skip to content

S3: surface bucket listing failures and fix multi-role object count#5035

Merged
shahzadhaider1 merged 4 commits into
mainfrom
s3-surface-list-errors
Jun 18, 2026
Merged

S3: surface bucket listing failures and fix multi-role object count#5035
shahzadhaider1 merged 4 commits into
mainfrom
s3-surface-list-errors

Conversation

@shahzadhaider1

@shahzadhaider1 shahzadhaider1 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Problem

S3 scans reported 0 objects scanned for buckets that contain objects, with no errors logged. Investigation showed ListObjectsV2 was failing with AccessDenied on every configured role, but because a role was assumed, the failure was logged at V(3) and never surfaced. The scans completed "successfully" while scanning nothing.

The suppression exists for list-all-buckets mode (role without a bucket list), where the scanner probes every bucket in the account and denials are expected. Applying it when buckets are explicitly configured hides real failures on targets the user asked to scan. Note that role-assumption (STS) failures also surface on this code path, since role credentials are resolved lazily.

Changes

Commit 1: surface listing failures for explicitly configured buckets

  • Listing failures are now logged at error level whenever the bucket was explicitly configured, regardless of role. Suppression to V(3) remains only for list-all-buckets mode.
  • Decision lives in listErrorsAreExpected, covered by a unit test.
  • New metric bucket_list_errors_total{bucket, role_arn} records every listing failure; previously a failed bucket left no trace in metrics.

Commit 2: accumulate object count across role passes

  • scanBuckets runs once per configured role and reset its object counter each pass, so the final progress message reported only the last role's count; a multi-role scan could report 0 objects scanned even when earlier roles scanned objects. The counter is now owned by Chunks and shared across passes.
  • Unit test pins the cumulative behavior.

Impact

Misconfigured access (IAM, bucket policy, or role trust policy) on an explicitly configured bucket is now visible at default log verbosity and in metrics, and the scan completion message reports the true total across all roles.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

Note

Medium Risk
Changes S3 scan observability and progress messaging for multi-role and misconfigured access; behavior is more correct but may surface more errors in logs and metrics for existing deployments.

Overview
Fixes S3 scans that could finish with no objects scanned and no visible failure when ListObjectsV2 was denied or STS failed, and when multiple roles were configured.

Listing failures are no longer always downgraded to verbose logs when a role is assumed. A new listErrorsAreExpected rule keeps suppression only for role + scan-all-buckets mode; if buckets are explicitly configured, listing errors (including lazy STS failures) log at error level. Each failure increments bucket_list_errors_total with bucket and role_arn labels.

Completion reporting now accumulates object counts across every role pass in Chunks instead of resetting per scanBuckets call, so the final “objects scanned” message reflects the full scan.

Reviewed by Cursor Bugbot for commit bc1846e. Bugbot is set up for automated code reviews on this repo. Configure here.

Listing failures were logged at V(3) whenever a role was assumed, hiding
access and role-assumption errors for buckets the user explicitly asked
to scan. Suppression now applies only in list-all-buckets mode, and a
bucket_list_errors_total metric records every listing failure.
@shahzadhaider1 shahzadhaider1 requested a review from a team June 12, 2026 16:02
@shahzadhaider1 shahzadhaider1 requested a review from a team as a code owner June 12, 2026 16:02
scanBuckets runs once per configured role and reset its object counter
each pass, so the final progress message only reflected the last role's
count. Multi-role scans could report 0 objects scanned even when earlier
roles scanned objects.
@shahzadhaider1 shahzadhaider1 force-pushed the s3-surface-list-errors branch from 6827420 to f9b729c Compare June 12, 2026 16:13

@mariduv mariduv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that bucket name in prometheus seems like potentially high cardinality but several other metrics are already doing it so 👌

@MuneebUllahKhan222 MuneebUllahKhan222 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shahzadhaider1

Copy link
Copy Markdown
Contributor Author

1 test in git source is failing but it isn't related to the changes made in this PR, so I'll go ahead and merge the PR.

@shahzadhaider1 shahzadhaider1 merged commit 30d5bb9 into main Jun 18, 2026
15 of 16 checks passed
@shahzadhaider1 shahzadhaider1 deleted the s3-surface-list-errors branch June 18, 2026 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants