S3: surface bucket listing failures and fix multi-role object count#5035
Merged
Conversation
Listing failures were logged at V(3) whenever a role was assumed, hiding access and role-assumption errors for buckets the user explicitly asked to scan. Suppression now applies only in list-all-buckets mode, and a bucket_list_errors_total metric records every listing failure.
scanBuckets runs once per configured role and reset its object counter each pass, so the final progress message only reflected the last role's count. Multi-role scans could report 0 objects scanned even when earlier roles scanned objects.
6827420 to
f9b729c
Compare
mariduv
approved these changes
Jun 15, 2026
mariduv
left a comment
Contributor
There was a problem hiding this comment.
I'd say that bucket name in prometheus seems like potentially high cardinality but several other metrics are already doing it so 👌
Contributor
Author
|
1 test in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
S3 scans reported
0 objects scannedfor buckets that contain objects, with no errors logged. Investigation showedListObjectsV2was failing withAccessDeniedon every configured role, but because a role was assumed, the failure was logged atV(3)and never surfaced. The scans completed "successfully" while scanning nothing.The suppression exists for list-all-buckets mode (role without a bucket list), where the scanner probes every bucket in the account and denials are expected. Applying it when buckets are explicitly configured hides real failures on targets the user asked to scan. Note that role-assumption (STS) failures also surface on this code path, since role credentials are resolved lazily.
Changes
Commit 1: surface listing failures for explicitly configured buckets
V(3)remains only for list-all-buckets mode.listErrorsAreExpected, covered by a unit test.bucket_list_errors_total{bucket, role_arn}records every listing failure; previously a failed bucket left no trace in metrics.Commit 2: accumulate object count across role passes
scanBucketsruns once per configured role and reset its object counter each pass, so the final progress message reported only the last role's count; a multi-role scan could report0 objects scannedeven when earlier roles scanned objects. The counter is now owned byChunksand shared across passes.Impact
Misconfigured access (IAM, bucket policy, or role trust policy) on an explicitly configured bucket is now visible at default log verbosity and in metrics, and the scan completion message reports the true total across all roles.
Checklist:
make test-community)?make lintthis requires golangci-lint)?Note
Medium Risk
Changes S3 scan observability and progress messaging for multi-role and misconfigured access; behavior is more correct but may surface more errors in logs and metrics for existing deployments.
Overview
Fixes S3 scans that could finish with no objects scanned and no visible failure when
ListObjectsV2was denied or STS failed, and when multiple roles were configured.Listing failures are no longer always downgraded to verbose logs when a role is assumed. A new
listErrorsAreExpectedrule keeps suppression only for role + scan-all-buckets mode; if buckets are explicitly configured, listing errors (including lazy STS failures) log at error level. Each failure incrementsbucket_list_errors_totalwithbucketandrole_arnlabels.Completion reporting now accumulates object counts across every role pass in
Chunksinstead of resetting perscanBucketscall, so the final “objects scanned” message reflects the full scan.Reviewed by Cursor Bugbot for commit bc1846e. Bugbot is set up for automated code reviews on this repo. Configure here.