Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.util.ConcurrentModificationException occurring during HikariCP isConnectionAlive()/isConnectionDead() starting with release 2.3.3 #855

Closed
malacroix opened this issue Jan 26, 2024 · 14 comments
Assignees
Labels
bug Something isn't working

Comments

@malacroix
Copy link

Describe the bug

After upgrading our AWS Advanced JDBC Driver to 2.3.3 (from 2.3.1) on various Spring Boot applications, we started noticing the following warning in our logs:

HikariPool-1 - Failed to validate connection software.amazon.jdbc.wrapper.ConnectionWrapper@de81c5e - com.mysql.cj.jdbc.ConnectionImpl@421c3a50 (java.util.ConcurrentModificationException). Possibly consider using a shorter maxLifetime value.

Looking at HikariCP code, we see the warning being generated from the isConnectionDead() (see here) and we can see the method is called when getting a connection from the Connection Pool and it must validate if the connection is still alive or dead (see here)

Expected Behavior

The ConcurrentModificationException should not be happening

What plugins are used? What other connection properties were set?

No specific plugins or other AWS specific connection properties configured, so using all defaults.

Current Behavior

We are getting the following error consistently:

HikariPool-1 - Failed to validate connection software.amazon.jdbc.wrapper.ConnectionWrapper@de81c5e - com.mysql.cj.jdbc.ConnectionImpl@421c3a50 (java.util.ConcurrentModificationException). Possibly consider using a shorter maxLifetime value.

This is causing the connection to be identified as dead and thus HikariCP closes the connection and gets another one from its pool.

Reproduction Steps

see above

Possible Solution

N/A

Additional Information/Context

We have multiple environments, some are using AWS Aurora MySQL, some outside and are using play MySQL (or from other provider). And we have many services that have started upgrading to AWS advanced JDBC Driver.

This error doesn't happen when using the MySQL Connector/J driver com.mysql.cj.jdbc.Driver directly for non AWS environments. It only happens on environments using AWS Aurora databases and when configuring JDBC Driver to software.amazon.jdbc.Driver with the JDBC protocol as jdbc:aws-wrapper:mysql:.

It seems the mysql-connector-j version being used doesn't really matter. We have noticed the error on services using MySQL Connector 8.0.21, 8.0.33 or 8.2.0, but also haven't seen the issue on other services having the same versions.

We confirmed that the error seems to be happening from 2.3.3 release. Most of our services have upgraded from 2.3.1 directly to 2.3.3 when we started seeing those errors, but then, we went back to 2.3.1 (where the issue wasn't seen) and upgraded by a single version to 2.3.2 and we couldn't see the error. So that seems to confirm the issue is being introduced in 2.3.3 and wasn't present in 2.3.2 or below.

The AWS Advanced JDBC Driver version used

2.3.3

JDK version used

A wide range of JDK docker images (openjdk:11-jdk-slim, openjdk:11.0.5-jre-slim, eclipse-temurin:17-jdk...)

Operating System and version

see JDK version images

@malacroix malacroix added the bug Something isn't working label Jan 26, 2024
@crystall-bitquill
Copy link
Contributor

Hi @malacroix,

Thanks for reaching out and raising this issue.

We'll take a look at this and keep you updated as we investigate.

Thank you for your patience!

@NikolayMetchev
Copy link

I am seeing something similar. I think it might be caused by the efm2 plugin...

@aaronchung-bitquill
Copy link
Contributor

Hi @malacroix

Would you be able to provide what versions of Spring Boot and Hikari you are using, as well as any configurations you may be using for them?

Thank you

@malacroix
Copy link
Author

@aaronchung-bitquill We have many applications that are "micro-services", isolated from each other and each are using different sets of dependency versions (though they are all at AWS Advanced JDBC Wrapper 2.3.3). I can get the different combinations of dependency versions they are using.
For Hikari, they are all using the version from the Spring Boot's version (which is defined in https://github.com/spring-projects/spring-boot/blob/main/spring-boot-project/spring-boot-dependencies/build.gradle)

Though for the configurations, each applications have their own application.yaml, so I won't provide those (there are also a lot of irrelevant configurations). If you have any specific configuration you want me to look at, let me know, but I could see that most of them are all setting their datasource through the following properties:

spring:
  datasource:
    driver-class-name: software.amazon.jdbc.Driver
    url: <url>
    username: <user>
    password: <password>

I will come back soon with a good sample of the various combinations of dependencies for our different applications.

@canelzio
Copy link

canelzio commented Feb 7, 2024

We are facing a similar issue during our stress tests. we are using AWS Advanced JDBC Driver 2.3.3 with HikariCP v5.0.1 and SpringBoot v3.1.7.
The StackTrace says "java.util.ConcurrentModificationException: null" and leads to "software.amazon.jdbc.plugin.efm2.MonitorImpl.startMonitoring(MonitorImpl.java:169)".

Enabling efm (efm2 disabled) does not lead to any problems.

@malacroix
Copy link
Author

Oh thanks @canelzio for reporting this! I wanted to test out disabling efm2 as the next step, but with your confirmation, I think its rather clear it pinpoints to an issue in efm2.

So, I'll still report the combinations of dependencies for our different applications that are using the AWS Advanced JDBC driver wrapper 2.3.3 and are getting the "java.util.ConcurrentModificationException" exception.

Java Spring Boot HikariCP MySQL Connector/J
11 2.6.6 4.0.3 8.2.0
11 2.7.9 4.0.3 8.2.0
17 3.1.0 5.0.1 8.2.0
11 2.2.0.RELEASE 3.4.1 8.0.33
11 2.7.17 4.0.3 8.0.28

Now, these are combinations of dependencies for applications that do not generate any "java.util.ConcurrentModificationException" exceptions.

Java Spring Boot HikariCP MySQL Connector/J
8 2.3.2 3.4.5 8.2.0
11 2.5.15 4.0.3 8.2.0
8 2.1.2.RELEASE 3.2.0 8.2.0
8 2.5.9 4.0.3 8.2.0

At first, I was thinking that there could be some conflict with Spring Boot 2.6.X and higher, but this one case with Spring Boot 2.2.0.RELEASE is confusing me.

@aaronchung-bitquill
Copy link
Contributor

Hi @malacroix @NikolayMetchev @canelzio

We have just merged in a fix. Could you kindly check our snapshot build and let us know if the issue still persists?

Thank you!

@dejandb
Copy link

dejandb commented Feb 7, 2024

The issue is in this line of code:
software.amazon.jdbc.plugin.efm2.MonitorImpl.startMonitoring(MonitorImpl.java:169)

The efm2 plugin uses a regular HashMap that is not thread-safe while the efm code uses a ConcurrentLinkedQueue that is thread-safe.

Here is the declaration in efm2:
private final HashMap<Long, Queue<WeakReference<MonitorConnectionContext>>> newContexts = new HashMap<>();

And efm:
private final Queue<MonitorConnectionContext> newContexts = new ConcurrentLinkedQueue<>();

I just filed an AWS support case earlier today with this information as we started getting these exceptions in one of our services under load.

@aaronchung-bitquill
Copy link
Contributor

If this is a blocker for anyone, a workaround would to be use efm (instead of efm2) as @canelzio suggested.

For visibility, this can be done using the wrapperPlugins property as shown in the documentation.

Note that if you were previously using the default plugins by not specifying the wrapperPlugins property, you may may also want to include the other default plugins like so:

properties.setProperty("wrapperPlugins", "auroraConnectionTracker,failover,efm");

@dejandb
Copy link

dejandb commented Feb 8, 2024

If this is a blocker for anyone, a workaround would to be use efm (instead of efm2) as @canelzio suggested.

For visibility, this can be done using the wrapperPlugins property as shown in the documentation.

Note that if you were previously using the default plugins by not specifying the wrapperPlugins property, you may may also want to include the other default plugins like so:

properties.setProperty("wrapperPlugins", "auroraConnectionTracker,failover,efm");

We already figured it out and are changing the configuration to use efm rather than efm2.

I also saw in your code that you are using ConcurrentHashMap - while that will solve the issue, ConcurrentHashMap comes with a performance hit - it will be better to use Caffeine (https://github.com/ben-manes/caffeine/wiki/Benchmarks) since it's much faster and offers very close semantics to Guava's cache or ConcurrentHashMap.

@canelzio
Copy link

canelzio commented Feb 8, 2024

Hi @malacroix @NikolayMetchev @canelzio

We have just merged in a fix. Could you kindly check our snapshot build and let us know if the issue still persists?

Thank you!

Hi @aaronchung-bitquill
I just gave a try to the new snapshot and during our stress test session we didn't get any issue. So the fix seems to solve the problem.
Regarding performance (as @dejandb pointed out) ConcurrentHashMap introduced a very little delay. In the same conditions (50 request threads) the new snapshot reached 54.5 TPS, while the 2.3.3 release got around 55.5 TPS: meaning less than 2% of slow down. Still acceptable from my side.

@malacroix
Copy link
Author

@aaronchung-bitquill, I can also confirm that the latest 2.3.4 Snapshot doesn't generate the error anymore.

@aaronchung-bitquill
Copy link
Contributor

@dejandb Thank you for suggesting Caffeine. Those benchmarks look promising. We'll give it a look.

@canelzio @malacroix Glad to hear that it is working for you guys and with similar performance. As such, I'll be closing this ticket. However, if you require further assistance, feel free to reopen this issue or create a new one.

Thank you!

@aaronchung-bitquill
Copy link
Contributor

Hi @malacroix @NikolayMetchev @canelzio @dejandb

I wanted to let you know that the fix is now available on the latest version of the driver.

Thank you!

@aaronchung-bitquill aaronchung-bitquill removed the pending release Resolution implemented, pending official release label Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants