Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The connection pool size decreases from 20 (maximum pool size) to zero [Azure SQL Managed Instance] #2239

Closed
lucaoliano opened this issue Sep 18, 2024 · 6 comments

Comments

@lucaoliano
Copy link

Hi
we are experiencing issues similar to #1474 where the total connections gradually decrease from the maximum pool size, 20, down to zero, see evidence from the debug log info:

2024-09-17 09:14:38.000 DEBUG 603 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Pool stats (total=19, active=0, idle=19, waiting=0)
…
2024-09-17 09:15:08.000 DEBUG 603 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Pool stats (total=18, active=0, idle=18, waiting=0)
…
2024-09-17 09:43:38.092 DEBUG 603 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Pool stats (total=0, active=0, idle=0, waiting=1)

When the total connections reach 0, all requests to the connection pool fail due to connection not available and the following exception is thrown:

org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30002ms.

Here's the details for our application:

  • DB is Azure SQL Managed Instance, with Proxy connection policy
  • Hikari connection pool version is 4.0.3 we only set max pool size. all others settings are left to default
  • JDBC driver is MS SQL 12.4.0

We reached Microsoft support, who confirmed that Azure SQL Managed Instance (SQL MI) utilizes an idle connection timeout mechanism, where connections that remain idle for approximately 30 minutes are automatically closed. This behavior is managed by the SQL MI proxy layer to free up resources and maintain optimal performance. To prevent connections from timing out, they recommend to implement a keep-alive mechanism or send periodic queries to keep the connection active.

However, they suggested opening a ticket on GitHub to HikariCP since they don’t have enough expertise or a dedicated team to provide specific settings.

Can you advice on what to do to resolve?

  1. is there any known bug with a fix version?
  2. or should we try with any specific settings e.g. maxLifeTime and keepAliveTime settings as below?

MaxLifetime: Setting this to 15 minutes (or 900,000 milliseconds) ensures that the lifetime of connections in the pool is shorter than the database or infrastructure-imposed connection timeout, such as the 30-minute idle timeout for Azure SQL Managed Instance. This helps avoid situations where the database closes idle connections unexpectedly.
KeepaliveTime: Setting this to 5 minutes (or 300,000 milliseconds) is a good approach to ensure that HikariCP sends periodic keep-alive queries. This should be less than the maxLifetime to avoid conflicts and ensure idle connections stay active.

@quaff
Copy link
Contributor

quaff commented Sep 27, 2024

I think you should set idleTimeout less than 30 minutes.

@lucaoliano
Copy link
Author

Hi @quaff,
thank you for your reply.

I have already set the idleTimeout, but I am still experiencing the issue. Would you suggest setting maxLifetime and/or keepaliveTime as well as I mentioned in my issue description?

@nitin-vavdiya
Copy link

Great article on this issue: https://medium.com/@eremeykin/how-to-deal-with-hikaricp-connection-leaks-part-1-1eddc135b464

I am unsure if this solves your issue, but it may help you debug the issue and find which code is causing the problem.

@lucaoliano
Copy link
Author

Hi @nitin-vavdiya, thank you for your feedback, we have checked the scenarios described in the link you shared and we aren't in any of them. As from the initial statement, our scenario is very similar to the one reported in many other linked, e.g.
#1474 (comment), with the only difference that we use Azure SQL Managed Instance. As many report the same issue in different context, with different DBs, this clearly indicates a bug into Hikari. Just for your info, we have been running other applications with Tomcat connection pool with SQL Managed Instance, and we haven't found the same issue.

@lucaoliano
Copy link
Author

Hi, we are taking decision to move away from Hikari in favor of Tomcat connection pool, however I just found this article
https://www.theguardian.com/info/2019/dec/02/faster-postgresql-connection-recovery referenced in one comment to a very similar issue #2161 (comment) in which it is stated that "HikariCP recommends that the driver-level socket timeout be set to (at least) 2-3x the longest running SQL transaction, or 30 seconds, whichever is longer". @brettwooldridge, if this is confirmed, we can try setting socketTimeout at the level of MS-SQL JDBC driver.

@brettwooldridge
Copy link
Owner

brettwooldridge commented Oct 24, 2024

Socket-level timeout and TCP keepalive are essential for reliability. Regardless of pool. See the new “Important” note added at the top of the main repo page.

So far in all of the reports of this issue that have included stacktrace analysis, all have pointed to TCP layer issues and none have indicated any failure in the pool itself.

Additionally, running the latest version of the pool is always recommended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants