perf(objstore): paginate S3 scan and share a cached scanner client#1209
perf(objstore): paginate S3 scan and share a cached scanner client#1209ksaurabhAparavi wants to merge 1 commit into
Conversation
Replace the StartAfter + result-equality dedup loop with proper ListObjectsV2 continuation-token pagination, so large buckets are walked page by page instead of relying on fragile end-of-listing heuristics. Cache the S3 client and share it across the scanner threads (guarded by a mutex, reset on list errors so the next scan re-connects), and raise ClientConfiguration.maxConnections to 64 so the shared connection pool does not serialize the scanner threads. Drop the per-object Content-Type HEAD request from processEntry; owner metadata is now fetched inline via ListObjectsV2 FetchOwner, removing a HEAD round-trip per object.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe PR optimizes S3 object-store bucket scanning by caching a shared S3 client across scanner threads, increasing the connection pool size, eliminating per-object HEAD requests, and replacing result-comparison pagination heuristics with proper ChangesS3 Scan Performance Optimization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Infer (1.2.0)packages/server/engine-lib/engLib/store/endpoints/objstore/base/scan.cpppackages/server/engine-lib/engLib/store/endpoints/objstore/base/scan.cpp:31:10: fatal error: 'engLib/eng.h' file not found ... [truncated 1151 characters] ... tem" "/usr/local/include" "-internal-isystem" packages/server/engine-lib/engLib/store/endpoints/objstore/base/base.cpppackages/server/engine-lib/engLib/store/endpoints/objstore/base/base.cpp:30:10: fatal error: 'engLib/eng.h' file not found ... [truncated 1151 characters] ... tem" "/usr/local/include" "-internal-isystem" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🤖 Internal: Discord sync markerAuto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete. |
Summary
StartAfter+ page result-equality dedup loop with properListObjectsV2continuation-token pagination.ClientConfiguration.maxConnectionsto 64 so the shared pool does not serialize threads.Content-TypeHEAD request fromprocessEntry; owner metadata is fetched inline viaListObjectsV2 FetchOwner, removing a round-trip per object.The generic
objstoreconnector inherits the same base scan, so it benefits too.Performance
Measured against a live bucket:
Type
refactor / perf
Testing
./builder testpasses (not run)Checklist
Linked Issue
Fixes #1208
Summary by CodeRabbit
Bug Fixes
Refactor