Skip to content

[DRAFT] Fix pool renewal race#835

Draft
andrzej-jackowski-scylladb wants to merge 2 commits intoscylladb:masterfrom
andrzej-jackowski-scylladb:fix-pool-renewal-race
Draft

[DRAFT] Fix pool renewal race#835
andrzej-jackowski-scylladb wants to merge 2 commits intoscylladb:masterfrom
andrzej-jackowski-scylladb:fix-pool-renewal-race

Conversation

@andrzej-jackowski-scylladb
Copy link
Copy Markdown

@dkropachev, @sylwiaszunejko, #317 continuously causes ScyllaDB CI failures. This is a draft PR that attempts to solve the problem, but I don't have a deep understanding of all the python-driver corner cases. Do you think this approach is worth pursuing?

=====

When pool creation races for the same host, a slower attempt can
overwrite a pool that another thread already published and close
connections with in-flight requests.

Capture the previous pool before connection setup, then compare
that state under the session lock before publishing the new pool.
If another thread changed the pool, discard the stale pool instead
of replacing the current one.

Keep pool removals behind the same lock so the check observes all
writers.

Fixes: #317

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

When pool creation races for the same host, a slower attempt can
overwrite a pool that another thread already published and close
connections with in-flight requests.

Capture the previous pool before connection setup, then compare
that state under the session lock before publishing the new pool.
If another thread changed the pool, discard the stale pool instead
of replacing the current one.

Keep pool removals behind the same lock so the check observes all
writers.

Fixes: scylladb#317
Add a deterministic unit test for the case where another thread
publishes a pool while a slower add attempt is still constructing
its pool.

This guards against closing in-flight connections by replacing the
pool that should remain current.

Refs: scylladb#317
@andrzej-jackowski-scylladb
Copy link
Copy Markdown
Author

@dkropachev, please reassign the review if there are more appropriate developers on the driver team.

@Lorak-mmk Lorak-mmk self-requested a review April 29, 2026 15:41
@Lorak-mmk
Copy link
Copy Markdown

I'll look into it tomorrow. I did think about how to solve the issue, and couldn't figure out anything sensible, so I'm looking forward to reading your solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Connection pool renewal after concurrent node bootstraps causes double statement execution

2 participants