[v26.1.x] kafka/cluster_link: fix deadlocks with kafka::cluster#30788
Merged
WillemKauf merged 6 commits intoJun 16, 2026
Merged
Conversation
Allocating a `config` on the stack in this test blows it up due to the size of a `configuration` object. Use the exposed `config::make_config()` helper to allocate it on the heap instead. (cherry picked from commit 04dcdcb)
Demonstrates deadlocks present in `client::stop()`. A wedged `client` waiting for an inflight schema registry request to finish which calls `stop()` is currently only awakened by the `cluster`'s `abort_source`, which in turn is only aborted when `cluster::stop()` is called, but only _after_ the `client` manages to close its gate. In summary, the client's gate waits on the hung request, the request waits on the abort source, and the abort source waits on the gate being closed, leading to a deadlock. (cherry picked from commit fe75dbb)
Fixes the deadlocks in `client::stop()` demonstrated by the previous commit. By splitting `cluster::stop()` into two functions, we fix the case of a hung request waiting on the `cluster`'s `abort_source` to fire while the `abort_source` is stuck waiting for the gate of the source client for the hung request to close. (cherry picked from commit 6e297d2)
Demonstrates deadlocks present in `link::stop()`. A wedged `link` waiting for an inflight schema registry request to finish which calls `stop()` is currently only awakened by the `cluster`'s `abort_source`, which in turn is only aborted when `cluster::stop()` is called, but only _after_ the `link` manages to close its gate. In summary, the link's gate waits on the hung request, the request waits on the abort source, and the abort source waits on the gate being closed, leading to a deadlock. (cherry picked from commit bbcc4f5)
Same deadlock as present in `kafka::client::stop()` - we need to call `cluster::shutdown_input()` before attempting to close any gates, since a hung task will cause us to be stuck forever. (cherry picked from commit 5b7cc61)
WillemKauf
approved these changes
Jun 12, 2026
Collaborator
Author
Retry command for Build#85733please wait until all jobs are finished before running the slash command |
Collaborator
Author
CI test resultstest results on build#85733
test results on build#85762
|
We were writing all of these cases individually when in reality, all shutdown errors probably require nothing more than a `DEBUG` log line and an early return. Commit 6e297d2 resulted in a new `broken_named_semaphore` exception being propagated through this path, leading to `ERROR`s in this path due to the unknown exception route being taken. Future proof by checking `ssx::is_shutdown_exception()` and logging & early returning. (cherry picked from commit 33b107f)
pgellert
approved these changes
Jun 15, 2026
Contributor
|
Fix has been in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport of PR #30784