Describe the bug
We are experiencing an AWS Advanced JDBC Wrapper malfunction for failover in applications running in Open Liberty.
In the case of a failover, the failover plugins (failover and failover2) induce a transition into a non-functional final state of the application, where it under load never recovers from.
We note, that without the failover plugins, the application reaches a healthy state - after the expected blocking period in the sub-minute range associated to the DNS TTL settings.
Therefore, using the wrapper with failover plugins enabled, breaks applications running in Open Liberty, in the case of a failover.
In order to reproduce this issue, we build a minimal project: https://github.com/hlond/open-liberty-rds-failover-test
Expected Behavior
We expect a behavior like in version 2.6.0 which is the last AWS Advanced JDBC Wrapper version that works with the failover plugins enabled when run in Open Liberty. Here the failover of a cluster is detected quickly and the application recovers appropriately
/\ Grafana /‾‾/
/\ / \ |\ __ / /
/ \/ \ | |/ / / ‾‾\
/ \ | ( | (‾) |
/ __________ \ |_|\_\ \_____/
execution: local
script: tests/endpoint-wrapper-test.js
output: -
scenarios: (100.00%) 1 scenario, 1 max VUs, 1m30s max duration (incl. graceful stop):
* default: 1 looping VUs for 1m0s (gracefulStop: 30s)
...
running (0m18.0s), 1/1 VUs, 16 complete and 0 interrupted iterations
default [ 30% ] 1 VUs 0m18.0s/1m0s
time="2026-04-10T07:54:56Z" level=info msg="200 OK { duration: 6.086638, blocked: 0.00541, looking_up: 0, connecting: 0, tls_handshaking: 0, sending: 0.018116, waiting: 6.000049, receiving: 0.068473 }" source=console
time="2026-04-10T07:54:56Z" level=info msg="\"2026-04-10T07:54:56.450Z\"" source=console
running (0m19.0s), 1/1 VUs, 17 complete and 0 interrupted iterations
default [ 32% ] 1 VUs 0m19.0s/1m0s
running (0m20.0s), 1/1 VUs, 18 complete and 0 interrupted iterations
default [ 33% ] 1 VUs 0m20.0s/1m0s
running (0m21.0s), 1/1 VUs, 18 complete and 0 interrupted iterations
default [ 35% ] 1 VUs 0m21.0s/1m0s
time="2026-04-10T07:54:59Z" level=info msg="500 Internal Server Error { duration: 2214.257015, blocked: 0.005508, looking_up: 0, connecting: 0, tls_handshaking: 0, sending: 0.019793, waiting: 2213.784239, receiving: 0.452983 }" source=console
time="2026-04-10T07:54:59Z" level=info msg="\"2026-04-10T07:54:59.665Z\"" source=console
running (0m22.0s), 1/1 VUs, 18 complete and 0 interrupted iterations
default [ 37% ] 1 VUs 0m22.0s/1m0s
time="2026-04-10T07:55:00Z" level=info msg="200 OK { duration: 6.198184, blocked: 0.439266, looking_up: 0, connecting: 0.376779, tls_handshaking: 0, sending: 0.05514, waiting: 5.932385, receiving: 0.210659 }" source=console
time="2026-04-10T07:55:00Z" level=info msg="\"2026-04-10T07:55:00.673Z\"" source=console
...
█ TOTAL RESULTS
checks_total.......: 57 0.941045/s
checks_succeeded...: 98.24% 56 out of 57
checks_failed......: 1.75% 1 out of 57
✗ status is 200
↳ 98% — ✓ 56 / ✗ 1
...
What plugins are used? What other connection properties were set?
wrapperPlugins: auroraConnectionTracker,efm2,failover2,iam; wrapperDialect: aurora-pg; failureDetectionCount: 1; failureDetectionInterval: 500; failoverTimeoutMs: 30000; failoverWriterReconnectIntervalMs: 1000; failoverReaderConnectTimeoutMs: 1000; failoverClusterTopologyRefreshRateMs: 1000; stringType: unspecified
Current Behavior
From versions greater or equal to 2.6.1, after failover, the application responds with 500 status codes only
/\ Grafana /‾‾/
/\ / \ |\ __ / /
/ \/ \ | |/ / / ‾‾\
/ \ | ( | (‾) |
/ __________ \ |_|\_\ \_____/
execution: local
script: tests/endpoint-wrapper-test.js
output: -
scenarios: (100.00%) 1 scenario, 1 max VUs, 1m30s max duration (incl. graceful stop):
* default: 1 looping VUs for 1m0s (gracefulStop: 30s)
...
running (0m24.0s), 1/1 VUs, 22 complete and 0 interrupted iterations
default [ 40% ] 1 VUs 0m24.0s/1m0s
time="2026-04-10T07:59:57Z" level=info msg="200 OK { duration: 11.780068, blocked: 0.005679, looking_up: 0, connecting: 0, tls_handshaking: 0, sending: 0.018721, waiting: 11.685849, receiving: 0.075498 }" source=console
time="2026-04-10T07:59:57Z" level=info msg="\"2026-04-10T07:59:57.665Z\"" source=console
running (0m25.0s), 1/1 VUs, 23 complete and 0 interrupted iterations
default [ 42% ] 1 VUs 0m25.0s/1m0s
time="2026-04-10T07:59:58Z" level=info msg="500 Internal Server Error { duration: 258.530431, blocked: 0.004885, looking_up: 0, connecting: 0, tls_handshaking: 0, sending: 0.016515, waiting: 258.421381, receiving: 0.092535 }" source=console
time="2026-04-10T07:59:58Z" level=info msg="\"2026-04-10T07:59:58.925Z\"" source=console
running (0m26.0s), 1/1 VUs, 24 complete and 0 interrupted iterations
default [ 43% ] 1 VUs 0m26.0s/1m0s
time="2026-04-10T07:59:59Z" level=info msg="500 Internal Server Error { duration: 15.942459, blocked: 0.437148, looking_up: 0, connecting: 0.374119, tls_handshaking: 0, sending: 0.053102, waiting: 15.760584, receiving: 0.128773 }" source=console
time="2026-04-10T07:59:59Z" level=info msg="\"2026-04-10T07:59:59.942Z\"" source=console
...
█ TOTAL RESULTS
checks_total.......: 59 0.970975/s
checks_succeeded...: 40.67% 24 out of 59
checks_failed......: 59.32% 35 out of 59
✗ status is 200
↳ 40% — ✓ 24 / ✗ 35
...
We emphasize, that the application does never recover, which can be observed for an elongated test duration
running (0h50m08.0s), 1/1 VUs, 297 complete and 0 interrupted iterations
default [ 84% ] 1 VUs 0h50m08.0s/1h0m0s
running (0h50m09.0s), 1/1 VUs, 297 complete and 0 interrupted iterations
default [ 84% ] 1 VUs 0h50m09.0s/1h0m0s
running (0h50m10.0s), 1/1 VUs, 297 complete and 0 interrupted iterations
default [ 84% ] 1 VUs 0h50m10.0s/1h0m0s
running (0h50m11.0s), 1/1 VUs, 297 complete and 0 interrupted iterations
default [ 84% ] 1 VUs 0h50m11.0s/1h0m0s
time="2026-04-10T07:31:23Z" level=info msg="500 Internal Server Error { duration: 10010.194834, blocked: 0.404858, looking_up: 0, connecting: 0.348769, tls_handshaking: 0, sending: 0.033519, waiting: 10010.06085, receiving: 0.100465 }" source=console
time="2026-04-10T07:31:23Z" level=info msg="\"2026-04-10T07:31:23.999Z\"" source=console
However, by removing the failover plugin from the connection options, we find that the application recovers after around 5 seconds, which is the TTL of the underlying DNS resolution to the IP address of the alternative RDS instance, as expected
/\ Grafana /‾‾/
/\ / \ |\ __ / /
/ \/ \ | |/ / / ‾‾\
/ \ | ( | (‾) |
/ __________ \ |_|\_\ \_____/
execution: local
script: tests/endpoint-wrapper-test.js
output: -
scenarios: (100.00%) 1 scenario, 1 max VUs, 1m30s max duration (incl. graceful stop):
* default: 1 looping VUs for 1m0s (gracefulStop: 30s)
...
running (0m24.0s), 1/1 VUs, 22 complete and 0 interrupted iterations
default [ 40% ] 1 VUs 0m24.0s/1m0s
time="2026-04-10T08:19:06Z" level=info msg="200 OK { duration: 4.970019, blocked: 0.005365, looking_up: 0, connecting: 0, tls_handshaking: 0, sending: 0.01781, waiting: 4.82412, receiving: 0.128089 }" source=console
time="2026-04-10T08:19:06Z" level=info msg="\"2026-04-10T08:19:06.095Z\"" source=console
running (0m25.0s), 1/1 VUs, 23 complete and 0 interrupted iterations
default [ 42% ] 1 VUs 0m25.0s/1m0s
running (0m26.0s), 1/1 VUs, 24 complete and 0 interrupted iterations
default [ 43% ] 1 VUs 0m26.0s/1m0s
running (0m27.0s), 1/1 VUs, 24 complete and 0 interrupted iterations
default [ 45% ] 1 VUs 0m27.0s/1m0s
running (0m28.0s), 1/1 VUs, 24 complete and 0 interrupted iterations
default [ 47% ] 1 VUs 0m28.0s/1m0s
running (0m29.0s), 1/1 VUs, 24 complete and 0 interrupted iterations
default [ 48% ] 1 VUs 0m29.0s/1m0s
running (0m30.0s), 1/1 VUs, 24 complete and 0 interrupted iterations
default [ 50% ] 1 VUs 0m30.0s/1m0s
time="2026-04-10T08:19:12Z" level=info msg="200 OK { duration: 5273.119536, blocked: 0.005668, looking_up: 0, connecting: 0, tls_handshaking: 0, sending: 0.017774, waiting: 5273.029983, receiving: 0.071779 }" source=console
time="2026-04-10T08:19:12Z" level=info msg="\"2026-04-10T08:19:12.369Z\"" source=console
running (0m31.0s), 1/1 VUs, 24 complete and 0 interrupted iterations
default [ 52% ] 1 VUs 0m31.0s/1m0s
time="2026-04-10T08:19:13Z" level=info msg="200 OK { duration: 4.968991, blocked: 0.005643, looking_up: 0, connecting: 0, tls_handshaking: 0, sending: 0.015134, waiting: 4.819513, receiving: 0.134344 }" source=console
time="2026-04-10T08:19:13Z" level=info msg="\"2026-04-10T08:19:13.375Z\"" source=console
...
█ TOTAL RESULTS
checks_total.......: 54 0.890355/s
checks_succeeded...: 100.00% 54 out of 54
checks_failed......: 0.00% 0 out of 54
✓ status is 200
...
Therefore, we conclude broken failover capabilities for applications that are running on Open Liberty.
Reproduction Steps
We built a minimal application, running on Open Libery, that allows to simulate failover scenarios for different versions of the wrapper.
It can be found here: https://github.com/hlond/open-liberty-rds-failover-test
Specifically, it provides a simulated failover scenario, based on k6 and the aws cli. For one minute it creates load on a test endpoint that performs a trivial SQL query against the RDS cluster. After 10 seconds, a failover is triggered via aws cli. With this, we demonstrate the change in failover capabilities, depending on the chosen version of the wrapper.
The README.md of the test project describes how to run it. After setting up the environment, proceed by building the application, paramezerized by the wrapper version.
Build the application based on the last working version of the wrapper
docker compose build --build-arg AWS_ADVANCED_JDBC_WRAPPER_VERSION=2.6.0
Build the application based on the first broken version of the wrapper
docker compose build --build-arg AWS_ADVANCED_JDBC_WRAPPER_VERSION=2.6.1
Build the application based on a more recent version of the wrapper
docker compose build --build-arg AWS_ADVANCED_JDBC_WRAPPER_VERSION=3.3.0
Each build can be run via
The failover scenario can be run via
Possible Solution
We have determined, that the wrapper went from functional to non-functional between the patch releases 2.6.0 and 2.6.1.
We have further reduced the relevant scope of this discussion to the merge request #1444, by cherry picking the commit 5a7eec7, via
git checkout tags/2.6.0
git cherry-pick 5a7eec7
In our analysis, we build two images: one for tags/2.6.0, and one for tags/2.6.0 + cherry picking 5a7eec7, via
./gradlew :aws-advanced-jdbc-wrapper:build -x test
The latter one shows the symptoms that we are experiencing since 2.6.1. We emphasize, that the malfunction persist also for more recent versions, such as 3.3.0.
Additional Information/Context
Test Setup
In the test project we use:
- Aurora Postgresql cluster version 17
- A recent version of open liberty
- Jakarte EE 10
- MicroProfile 7.1
- Persistance API 3.1
- A recent version of the postgresql driver
- networkaddress.cache.ttl set to 5 seconds (in harmony with AWS specifications)
We suspect that the problem lies in the interaction between the wrapper and the Connection Manager implementation in Open Liberty.
In our example application, we deliberately chose not to include exception handling, as we would expect the Connection Manager to be able to recognize the changed state following the failover and return to a functional state.
This was the case up to version 2.6.0. From version 2.6.1 onwards, the application no longer recovers.
The AWS Advanced JDBC Wrapper version used
2.6.0 (working); 2.6.1 (first not working); 3.3.0 (still not working)
JDK version used
IBM Semeru OpenJ9 17 (from Open Liberty image)
Operating System and version
Docker image open-liberty:kernel-slim-java17-openj9 (see test project Dockerfile)
Describe the bug
We are experiencing an AWS Advanced JDBC Wrapper malfunction for failover in applications running in Open Liberty.
In the case of a failover, the failover plugins (failover and failover2) induce a transition into a non-functional final state of the application, where it under load never recovers from.
We note, that without the failover plugins, the application reaches a healthy state - after the expected blocking period in the sub-minute range associated to the DNS TTL settings.
Therefore, using the wrapper with failover plugins enabled, breaks applications running in Open Liberty, in the case of a failover.
In order to reproduce this issue, we build a minimal project: https://github.com/hlond/open-liberty-rds-failover-test
Expected Behavior
We expect a behavior like in version
2.6.0which is the last AWS Advanced JDBC Wrapper version that works with the failover plugins enabled when run in Open Liberty. Here the failover of a cluster is detected quickly and the application recovers appropriatelyWhat plugins are used? What other connection properties were set?
wrapperPlugins: auroraConnectionTracker,efm2,failover2,iam; wrapperDialect: aurora-pg; failureDetectionCount: 1; failureDetectionInterval: 500; failoverTimeoutMs: 30000; failoverWriterReconnectIntervalMs: 1000; failoverReaderConnectTimeoutMs: 1000; failoverClusterTopologyRefreshRateMs: 1000; stringType: unspecified
Current Behavior
From versions greater or equal to
2.6.1, after failover, the application responds with 500 status codes onlyWe emphasize, that the application does never recover, which can be observed for an elongated test duration
However, by removing the failover plugin from the connection options, we find that the application recovers after around 5 seconds, which is the TTL of the underlying DNS resolution to the IP address of the alternative RDS instance, as expected
Therefore, we conclude broken failover capabilities for applications that are running on Open Liberty.
Reproduction Steps
We built a minimal application, running on Open Libery, that allows to simulate failover scenarios for different versions of the wrapper.
It can be found here: https://github.com/hlond/open-liberty-rds-failover-test
Specifically, it provides a simulated failover scenario, based on k6 and the aws cli. For one minute it creates load on a test endpoint that performs a trivial SQL query against the RDS cluster. After 10 seconds, a failover is triggered via aws cli. With this, we demonstrate the change in failover capabilities, depending on the chosen version of the wrapper.
The README.md of the test project describes how to run it. After setting up the environment, proceed by building the application, paramezerized by the wrapper version.
Build the application based on the last working version of the wrapper
Build the application based on the first broken version of the wrapper
Build the application based on a more recent version of the wrapper
Each build can be run via
The failover scenario can be run via
Possible Solution
We have determined, that the wrapper went from functional to non-functional between the patch releases
2.6.0and2.6.1.We have further reduced the relevant scope of this discussion to the merge request #1444, by cherry picking the commit
5a7eec7, viaIn our analysis, we build two images: one for
tags/2.6.0, and one fortags/2.6.0+ cherry picking5a7eec7, via./gradlew :aws-advanced-jdbc-wrapper:build -x testThe latter one shows the symptoms that we are experiencing since
2.6.1. We emphasize, that the malfunction persist also for more recent versions, such as3.3.0.Additional Information/Context
Test Setup
In the test project we use:
We suspect that the problem lies in the interaction between the wrapper and the Connection Manager implementation in Open Liberty.
In our example application, we deliberately chose not to include exception handling, as we would expect the Connection Manager to be able to recognize the changed state following the failover and return to a functional state.
This was the case up to version
2.6.0. From version2.6.1onwards, the application no longer recovers.The AWS Advanced JDBC Wrapper version used
2.6.0 (working); 2.6.1 (first not working); 3.3.0 (still not working)
JDK version used
IBM Semeru OpenJ9 17 (from Open Liberty image)
Operating System and version
Docker image open-liberty:kernel-slim-java17-openj9 (see test project Dockerfile)