Add support for disconnect on timeout to recover early from no `RST` packet failures #2082

yangbodong22011 · 2022-04-22T07:14:15Z

Bug Report

I'm one of the Jedis Reviewers and our customers are experiencing unrecoverable issues with Lettuce in production.

Lettuce connects to a Redis host and reads and writes normally. However, if the host fails (the hardware problem directly causes the shutdown, and there is no RST reply to the client at this time), the client will continue to time out until the tcp retransmission ends, and it can be recovered. At this time, it takes about 925.6 s in Linux ( Refer to tcp_retries2 ).

          set k v
client ------------------> redis

          redis server down, no rst
          
          set k v (retran)  1
tcp ------------------> redis (no reply)

      	  set k v (retran)  2
tcp ------------------> redis (no reply)     

    ... after 925.6s

           RST 
tcp ------------------> redis 

      reconnect

Why KeepAlive doesn't fix this

#1437 (Lettuce supports the option to set KEEPALIVE since version 6.1.0 )

Because the priority of the retransmission packet is higher than that of keepalive, before reaching the keepalive stage, it will continue to retransmit until it is reconnected.

In what scenario is this question sent?

In most cases, when the operating system is shut down and the process exits, RST can be returned to the client, but RST will not be returned when power is cut off or some machine hardware fails.
In cloud environments, SLB is usually used. When the backend host fails, if the SLB does not support connection draining, there will be problems.

How to reproduce this issue

Start a Redis on a certain port, let's say 6379, and use the following code to connect to Redis.

        RedisClient client = RedisClient.create(RedisURI.Builder.redis(host, port).withPassword(args[1])
            .withTimeout(Duration.ofSeconds(timeout)).build());

        client.setOptions(ClientOptions.builder()
            .socketOptions(socketOptions)
            .autoReconnect(autoReconnect)
            .disconnectedBehavior(disconnectedBehavior)
            .build());

        RedisCommands<String, String> sync = client.connect().sync();

        for (int i = 0; i < times; i++) {
            Thread.sleep(1000);

            try {
                LOGGER.info("{}:{}", i, sync.set("" + i, "" + i));
            } catch (Exception e) {
                LOGGER.error("Set Exception: {}", e.getMessage());
            }
        }

Use iptables to disable port 6379 packets on the Redis machine.

iptables -A INPUT -p tcp --dport 6379 -j DROP
iptables -A OUTPUT -p tcp --sport 6379 -j DROP

Observe that the client starts timing out and cannot recover until after 925.6 s (related to tcp_retries2)
After the test, clear the iptables rules

iptables -F INPUT
iptables -F OUTPUT

How to fix this

We should provide the activation mechanism of the application layer, that is, on the underlying Netty link, periodically insert the activation data packet, if the activation data packet times out, the client will initiate a reconnection to recover quickly.

How Jedis avoids this problem

Jedis is a connection pool mode. When an API times out, Jedis will destroy the link and obtain it again from the connection pool, which can avoid the above problems.

Environment

Lettuce version(s): main branch
Redis version: unstable branch

The text was updated successfully, but these errors were encountered:

yangbodong22011 · 2022-04-25T06:00:36Z

@wpstan You can't Schedule Task ping redis in another thread or a new connection, you can only choose an existing connection in netty, otherwise you won't be able to recover from connection draining scenarios.

yangbodong22011 · 2022-04-26T01:29:23Z

It recommends setting keepalive parameters for the underlying TCP connection. If the JDK version used is less than 11，Netty Epoll dependencies need to import.

@wpstan keepalive is only valid for sub links in pubsub (without actively sending data, just receiving data from server).
see Why KeepAlive doesn't fix this section in my top comments.

mp911de · 2022-04-26T11:51:27Z

Thanks a lot for the detailed analysis and write up. I wasn't aware of the retransmit vs keepalive priority relationship.

With a blocking I/O client like Jedis is makes a lot of sense to discard connections that have timed out as the structural read is initiated by the method that is being called. Otherwise, a late data reception (caused by a slow server response) leads to protocol desynchronization.

Lettuce has a command timeout mechanism, too. However, the internal architecture allows us to continue operations even in the case a command is timing out (mostly to protected the caller) as the command response parsing is event-driven.

Disconnecting the connection upon command timeout is a pretty drastic approach that can work in certain scenarios in which you cannot customize retransmit parameters. With the current configuration means (new connection events, customization of the Netty bootstrap, command listeners) you have all the necessary bits to implement such a behavior within your application.

If we learn that a larger part of our community is interested in such a feature out of the box, then we will consider such an enhancement.

yangbodong22011 · 2022-04-27T01:40:40Z

@mp911de

Disconnecting the connection upon command timeout is a pretty drastic approach that can work in certain scenarios in which you cannot customize retransmit parameters.

yes, adjusting parameters such as tcp_retries2 is unreachable for many services in docker and k8s.

If we learn that a larger part of our community is interested in such a feature out of the box, then we will consider such an enhancement.

In many of our customers, because of this issue, we need documentation explaining why and do not recommend Lettuce unless the user can accept a long recovery time.

spring-data-redis uses the Lettuce driver by default. After this problem occurs, in order to avoid the impact, users will also try to switch the driver to Jedis, which brings extra burden to them.

I personally also really like the advanced Lettuce client, and I think the community needs to fix this, just like the keepAlive issue.

Can you guide me to enhance this if you don't have time to spare, thanks.

mp911de · 2022-04-27T06:40:23Z

Indeed, tcp_retries2 can be impossible to customize in a container or serverless environment. I renamed this ticket to reflect its intent. I think it makes sense to host such a feature (disconnect on timeout) within TimeoutOptions. We have CommandExpiryWriter that expires commands and there we could trigger a netty channel disconnect. I'm not fully sure about the design, we currently only have access to a RedisChannelWriter but handing out a netty Channel doesn't make sense. Maybe we could introduce a disconnect() method or so.

yangbodong22011 · 2022-04-27T08:13:44Z

I think it makes sense to host such a feature (disconnect on timeout) within TimeoutOptions.

Agreed, but I think a better strategy is to reconnect after X (1 by default) consecutive timeouts. The reasons are as follows:

Some users configure the timout to be very small, and the timeout is frequent for them, but the continuous timeout of X times may be an abnormal situation.
Lettuce is a non-connection pool mode, and there is an overhead for new connections, which may not be acceptable for users in point 1.

mp911de · 2022-04-27T08:57:45Z

I agree that it might not make sense to disconnect after the first timeout. The tricky part is to detect the right state. Let's assume a disconnect after 2 timeouts:

One might have a command sequence of TIMEOUT - SUCCESS - TIMEOUT and without further knowledge, we might disconnect on the second timeout. Or a case where there is no activity between two commands: TIMEOUT (because of RST) - long time without activity, where the remote host recovers - TIMEOUT (because of something else).

We have in other places a delay where we debounce events and delay activity, such as adaptive topology updates in Redis cluster where not every MOVED redirect causes an immediate topology refresh.

In any case, such a feature requires a bit more thought.

yangbodong22011 · 2022-04-27T09:12:31Z

I think a better strategy is to reconnect after X (1 by default) consecutive timeouts

I mean from X consecutive timeouts, if x = 2, the sequence should be: TIMEOUT - TIMEOUT, then, we can reconnect. for TIMEOUT - SUCCESS - TIMEOUT will set X = 0 again after SUCCESS and will not reconnect.

yangbodong22011 · 2022-05-10T12:07:11Z

I think a better strategy is to reconnect after X (1 by default) consecutive timeouts

I mean from X consecutive timeouts, if x = 2, the sequence should be: TIMEOUT - TIMEOUT, then, we can reconnect. for TIMEOUT - SUCCESS - TIMEOUT will set X = 0 again after SUCCESS and will not reconnect.

@mp911de What do you think of this strategy? If agreed, I will prepare a PR.

chuckz1321 · 2022-07-01T09:23:04Z

@yangbodong22011 How's the PR going :) We need this mechanism badly.

yangbodong22011 · 2022-07-01T09:31:58Z

@yangbodong22011 How's the PR going :) We need this mechanism badly.

Waiting for @mp911de to have time to process it, we don't have a firm strategy yet.

shelltea · 2022-11-15T03:11:38Z

Any update?

PingConnectionHandler will periodically send PING commands to Redis Server, and decide whether to reconnect based on whether it fails (the current strategy is to reconnect after three consecutive failures) This is essentially because KeepAlive that only relies on TCP is unreliable, for details, please refer to: redis#2082

shelltea · 2022-11-22T09:00:44Z

@yangbodong22011 Hi, The PR #2253 is closed. So can you provide a custom Spring Boot starter if you have time. We really need this connection validation mechanism. Compare to switch driver to Jedis, a Spring Boot starter is more easy to use for the users of Lettuce.

yunbozhang-msft · 2022-11-28T09:49:42Z

Hi, now it seems that keep-alive is a better measure for Lettuce. Can refer to this comments: #2253 (comment)

yangbodong22011 · 2022-12-05T02:23:38Z

Hi, now it seems that keep-alive is a better measure for Lettuce. Can refer to this comments: #2253 (comment)

But keep-alive does not solve the problem raised by this issue, which is why we have kept it open. If #2253 is not what we want, I think we need to continue to communicate and discuss to completely solve this problem.

vongosling · 2022-12-15T00:58:45Z

@mp911de I think you should consider the resolution seriously. many people are deeply troubled by this problem. I also found we had many discussions about this problem, but no further solution was packed by our lettuce team. We really have no choice but to let our customers to chose Jedis for a better network shrink problem. @yangbodong22011

PingConnectionHandler will periodically send PING commands to Redis Server, and decide whether to reconnect based on whether it fails (the current strategy is to reconnect after three consecutive failures) This is essentially because KeepAlive that only relies on TCP is unreliable, for details, please refer to: redis#2082

richieyan · 2023-01-29T09:23:45Z

Refer #1428 , we resolve this problem by add keep-alive and tcp_user_timeout options in our client configuration.

Keep-alive wouldn't work in a situation that client send request to server continuously，but server never ack.
So we need tcp_user_timeout to check whether the connection is active or not.
You can refer https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/ to know how tcp_user_timeout work.

You need add netty-transport-native-epoll to your dependencies

Gradle deps:
check your lettuce version and use proper netty version

api 'io.netty:netty-transport-native-epoll:4.1.65.Final:linux-x86_64'
api 'io.netty:netty-transport-native-epoll:4.1.65.Final:linux-aarch_64'

Java code example:

       // customize your netty
        ClientResources clientResources = ClientResources.builder()
                .nettyCustomizer(new NettyCustomizer() {
                    @Override
                    public void afterBootstrapInitialized(Bootstrap bootstrap) {
                        if (EpollProvider.isAvailable()) {
                            // TCP_USER_TIMEOUT >= TCP_KEEPIDLE + TCP_KEEPINTVL * TCP_KEEPCNT
                            // https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
                            bootstrap.option(EpollChannelOption.TCP_USER_TIMEOUT, tcpUserTimeout);
                        }
                    }
                })
                .build();


       // create your socket options
       SocketOptions socketOptions = SocketOptions.builder()
                .connectTimeout(connectTimeout)
                .keepAlive(SocketOptions.KeepAliveOptions.builder()
                        .enable()
                        .idle(Duration.ofSeconds(5))
                        .interval(Duration.ofSeconds(1))
                        .count(3)
                        .build())
                .build();

yangbodong22011 · 2023-01-31T04:42:04Z

@richieyan Thanks for your comments and code, I did some tests and here are the results:

The following tests have the following prerequisites:

Configure TCP_USER_TIMEOUT
Limit Redis networking using iptables.

	KeepAlive(on)	KeepAlive(off)
TCP Retran(Yes)	Tcp Retran has higher priority than KeepAlive, so KeepAlive will not start, but after TCP_USER_TIMEOUT arrives, the connection will be closed.	same as left
TCP Retran(No)	`KEEPALIVE_TIME = TCP_KEEPIDLE + TCP_KEEPINTVL * TCP_KEEPCNT`, If TCP_USER_TIMEOUT is less than KEEPALIVE_TIME, the KeepAlive process will be interrupted and the connection will be closed; if TCP_USER_TIMEOUT is greater than or equal to KEEPALIVE_TIME, KeepAlive will reconnect first.	has no effect (because TCP_USER_TIMEOUT needs unacknowledged packets to trigger)

To reproduce this test, need to pay attention to:

The maven configuration of netty-transport-native-epoll needs to add classifier

<dependency>
    <groupId>io.netty</groupId>
    <artifactId>netty-transport-native-epoll</artifactId>
    <version>4.1.65.Final</version>
    <classifier>linux-x86_64</classifier>
</dependency>

Open the Debug log of Lettuce and ensure that the following logs appear to ensure that Epoll is used normally:

2023-01-31 10:24:30 [main] DEBUG i.n.u.internal.NativeLibraryLoader - Successfully loaded the library /tmp/libnetty_transport_native_epoll_x86_642162049332005825051.so
2023-01-31 10:24:30 [main] DEBUG i.l.core.resource.EpollProvider - Starting with epoll library

summary

1, TCP_USER_TIMEOUT can indeed solve the problem of this issue on the Linux platform.

2，Not yet verified on MacOS and Windows.

chuckz1321 · 2023-04-12T02:44:24Z

Hi @yangbodong22011 @richieyan Hope you are doing all well!
Have you changed the TCP_USER_TIMEOUT? The reconnect happens after 10s even I set the value to 15 or 5.
Is there any other parameters controlling the reconnection machenism?

My test step:

Connect to Redis (one primary and one replica)
Send 'GET' continously
Restart primary (NO 'RST' sent from primary)

After primary restarts, Lettuce can reconnect to replica. But the reconnect time, I think it should be controlled by TCP_USER_TIMEOUT. I don't know why the reconnection time is 10s constantly.

yangbodong22011 · 2023-04-12T07:53:18Z

@chuckz1321 Your reconnection is not due to no RST return. Obviously, your Server sent a FIN to the Client, then client begin to reconnect.

If you want to simulate this problem, you need to use the iptables command mentioned in the top comment.

chuckz1321 · 2023-04-12T08:46:22Z

@yangbodong22011

Thanks for your feedback.
For this one, the 'GET' did retrans for two times. But the WatchDog start the reconnection immediately. This time the server didn't send RST or FIN. The process seems to be unexpected...

Once the connection is broken, it seems netty awares of this firstly. And following Redis command will request another connection.
`
2023-04-12 08:40:05.467 INFO 44928 --- [nio-9888-exec-8] com.azure.redis4jedis.TestController : start to query from redis1681288805467
2023-04-12 08:40:05.917 ERROR 44928 --- [nio-9888-exec-8] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.data.redis.RedisSystemException: Redis exception; nested exception is io.lettuce.core.RedisException: io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection timed out] with root cause

io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection timed out

2023-04-12 08:40:05.927 INFO 44928 --- [nio-9888-exec-9] com.azure.redis4jedis.TestController : start to query from redis1681288805927
2023-04-12 08:40:05.980 INFO 44928 --- [xecutorLoop-1-1] i.l.core.protocol.ConnectionWatchdog : Reconnecting, last destination was cn3redis.redis.cache.chinacloudapi.cn/163.228.235.133:6379
2023-04-12 08:40:08.931 ERROR 44928 --- [nio-9888-exec-9] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 3 second(s)] with root cause
`

Any insights?

yangbodong22011 · 2023-04-12T09:08:10Z

@chuckz1321 This issue does not apply if the client machine is on macOS. Because the parameter like tcp_retries2 on macOS is very short (I haven't found documentation about this parameter, maybe here, but I'm not sure)

If the client machine is already on Linux, please confirm the size of the tcp_retries2 parameter.

#sysctl -a | grep net.ipv4.tcp_retries2
net.ipv4.tcp_retries2 = 15

chuckz1321 · 2023-04-12T09:17:37Z

@yangbodong22011
Checked before. I am running the application on CentOS. tcp_retries2 is 15.

If I leave all the configuration default, I can reproduce 15 times retrans

yangbodong22011 · 2023-09-01T03:08:16Z

@yangbodong22011 Brother, any update with your solution?

@huaxne Set TCP_USER_TIMEOUT，see #2082 (comment)

@mp911de Would you consider adding a TCP_USER_TIMEOUT config to Lettuce to fix this, I can contribute a PR.

mp911de · 2023-09-01T06:47:57Z

@yangbodong22011 Adding TCP_USER_TIMEOUT to SocketOptions (tcpUserTimeout) would be a great addition, happy to include that one.

yangbodong22011 · 2023-09-01T06:53:36Z

@yangbodong22011 Adding TCP_USER_TIMEOUT to SocketOptions (tcpUserTimeout) would be a great addition, happy to include that one.

okay, I will prepare a PR.

Original pull request: #2499

Add author and since, tags. Add Javadoc. Original pull request: #2499

Original pull request: #2499

Add author and since, tags. Add Javadoc. Original pull request: #2499

mp911de · 2023-09-01T09:48:30Z

Snapshots are deployed at https://oss.sonatype.org/content/repositories/snapshots/io/lettuce/lettuce-core/6.3.0.BUILD-SNAPSHOT/lettuce-core-6.3.0.BUILD-20230901.094627-49.jar, also available for 6.2.7-BUILD-SNAPSHOT

yangbodong22011 · 2023-09-01T13:51:07Z

@mp911de I verified 6.3.0.BUILD-SNAPSHOT and it succeeded. The details are as follows. I am not sure if we need to update a document to provide examples and reference it during official release. If necessary, please let me know where to add or modify it.

DO NOT configure TCP_USER_TIMEOUT will continue to be retransmitted.

Configure TCP_ USER_ TIMEOUT is 30 seconds, and a reconnection will be initiated using a new port 30 seconds after the first retransmission occurs!

my pom.xml is (notice: Linux need use native-epoll)

        <dependency>
            <groupId>io.lettuce</groupId>
            <artifactId>lettuce-core</artifactId>
            <version>6.3.0.BUILD-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>io.netty</groupId>
            <artifactId>netty-transport-native-epoll</artifactId>
            <version>4.1.65.Final</version>
            <classifier>linux-x86_64</classifier>
        </dependency>

in debug, will see this log:

2023-09-01 21:30:20 [main] DEBUG i.l.core.resource.EpollProvider - Starting with epoll library
2023-09-01 21:30:20 [main] DEBUG i.l.c.r.DefaultEventLoopGroupProvider - Allocating executor io.netty.channel.epoll.EpollEventLoopGroup

mp911de · 2023-09-04T08:58:15Z

Usually, SocketOptions is discoverable enough to explain what socket options can be configured. We also do not provide a documentation for keep-alive as folks should be familiar with TCP concepts and there's no need to explain how these work in our docs.

yangbodong22011 · 2023-09-21T01:57:48Z

@mp911de Hello, when do you expect to release the official version (non-SNAPSHOT), we need to notify users to upgrade to this version.

mp911de · 2023-10-16T13:14:14Z

I updated the release date to November 14, 2023.

2luckydog · 2023-11-15T04:00:46Z

I updated the release date to November 14, 2023.

Hello, Is there a change in the release time? We also need to notify users to upgrade the version, considering that we also have a large number of users waiting for this version.

mp911de · 2023-11-15T07:46:08Z

Yeah, Project Reactor has released just yesterday in the night so our release has slipped to today.

vprakashrao · 2023-11-22T18:56:00Z

@mp911de Will this updated lettuce (6.3.0) be available in any of the upcoming spring data redis 3.1.x releases

mp911de · 2023-11-23T08:01:30Z

No, because Spring upgrades only to bugfix releases in their bugfix release. You can in any case upgrade the version yourself as Lettuce 6.3 can be used as drop-in replacement.

Generally, upgrading to a newer Lettuce version works better than downgrading the Lettuce version.

wayn111 · 2023-12-01T08:15:50Z

@mp911de @yangbodong22011 maybe use spring boot provide way,

@Slf4j
@Component
public class LettuceConfig implements InitializingBean {

    @Autowired
    private RedisConnectionFactory redisConnectionFactory;

    @Override
    public void afterPropertiesSet() {
        if (redisConnectionFactory instanceof LettuceConnectionFactory c) {
            c.setValidateConnection(true);
        }
    }
}

setValidateConnection(true) this method will check whether the current connection is available every time the redis command is called.

2luckydog · 2023-12-01T08:29:23Z

@mp911de @yangbodong22011 maybe use spring boot provide way,

@Slf4j
@Component
public class LettuceConfig implements InitializingBean {

    @Autowired
    private RedisConnectionFactory redisConnectionFactory;

    @Override
    public void afterPropertiesSet() {
        if (redisConnectionFactory instanceof LettuceConnectionFactory c) {
            c.setValidateConnection(true);
        }
    }
}

setValidateConnection(true) this method will check whether the current connection is available every time the redis command is called.

@wayn111
I have tried using this method, but unfortunately it does not work and the connection cannot be closed.

wayn111 · 2023-12-01T08:43:54Z

@2luckydog check you log， Does it contain “Validation of shared connection failed; Creating a new connection.”?

2luckydog · 2023-12-01T08:56:09Z

@2luckydog check you log， Does it contain “Validation of shared connection failed; Creating a new connection.”?

@wayn111 This content is not included.

2luckydog · 2023-12-01T09:13:33Z

@2luckydog check you log， Does it contain “Validation of shared connection failed; Creating a new connection.”?

@wayn111
In fact, apart from constantly printing "command timeout after 1 minute", there are no other logs.

yangbodong22011 · 2024-08-15T03:35:36Z

The detailed solution is as follows

Use Lettuce >= 6.3.0

<<dependencies>
    <dependency>
        <groupId>io.lettuce</groupId>
        <artifactId>lettuce-core</artifactId>
        <version>6.3.0.RELEASE</version>
    </dependency>
    <dependency>
        <groupId>io.netty</groupId>
        <artifactId>netty-transport-native-epoll</artifactId>
        <version>4.1.100.Final</version>
        <classifier>linux-x86_64</classifier>
    </dependency>
</dependencies>

Config TCP_USER_TIMEOUT

import io.lettuce.core.ClientOptions;
import io.lettuce.core.RedisClient;
import io.lettuce.core.RedisURI;
import io.lettuce.core.SocketOptions;
import io.lettuce.core.SocketOptions.KeepAliveOptions;
import io.lettuce.core.SocketOptions.TcpUserTimeoutOptions;
import io.lettuce.core.api.StatefulRedisConnection;
import io.lettuce.core.api.sync.RedisCommands;
import java.time.Duration;

public class LettuceExample {
    /**
     * Enable TCP keepalive and configure the following three parameters:
     *  TCP_KEEPIDLE = 30
     *  TCP_KEEPINTVL = 10
     *  TCP_KEEPCNT = 3
     */
    private static final int TCP_KEEPALIVE_IDLE = 30;

    /**
     * The TCP_USER_TIMEOUT parameter can avoid situations where Lettuce remains stuck in a continuous timeout loop during a failure or crash event. 
     * refer: https://github.com/lettuce-io/lettuce-core/issues/2082
     */
    private static final int TCP_USER_TIMEOUT = 30;

    private static RedisClient client = null;
    private static StatefulRedisConnection<String, String> connection = null;

    public static void main(String[] args) {
        // Replace the values of host, user, password, and port with the actual instance information. 
        String host = "r-bp1s1bt2tlq3p1****.redis.rds.aliyuncs.com";
        String user = "r-bp1s1bt2tlq3p1****";
        String password = "Da****3";
        int port = 6379;

        // Config RedisURL
        RedisURI uri = RedisURI.Builder
                .redis(host, port)
                .withAuthentication(user, password)
                .build();

        // Config TCP KeepAlive
        SocketOptions socketOptions = SocketOptions.builder()
                .keepAlive(KeepAliveOptions.builder()
                        .enable()
                        .idle(Duration.ofSeconds(TCP_KEEPALIVE_IDLE))
                        .interval(Duration.ofSeconds(TCP_KEEPALIVE_IDLE/3))
                        .count(3)
                        .build())
                .tcpUserTimeout(TcpUserTimeoutOptions.builder()
                        .enable()
                        .tcpUserTimeout(Duration.ofSeconds(TCP_USER_TIMEOUT))
                        .build())
                .build();

        client = RedisClient.create(uri);
        client.setOptions(ClientOptions.builder()
                .socketOptions(socketOptions)
                .build());
        connection = client.connect();
        RedisCommands<String, String> commands = connection.sync();

        System.out.println(commands.set("foo", "bar"));
        System.out.println(commands.get("foo"));

        // If your application exits and you want to destroy the resources, call this method. Then, the connection is closed, and the resources are released. 
        connection.close();
        client.shutdown();
    }
}

mp911de added the type: feature A new feature label Apr 27, 2022

mp911de changed the title ~~Lettuce cannot recover from host failure(no RST returned)~~ Add support for disconnect on timeout to recover early from no RST packet failures Apr 27, 2022

yangbodong22011 mentioned this issue Nov 17, 2022

Introduce ping connecion handler #2253

Closed

4 tasks

mp911de assigned yangbodong22011 Sep 1, 2023

yangbodong22011 mentioned this issue Sep 1, 2023

Support TCP_USER_TIMEOUT option #2499

Closed

4 tasks

mp911de pushed a commit that referenced this issue Sep 1, 2023

Support TCP_USER_TIMEOUT option #2082

5b5258c

Original pull request: #2499

mp911de added a commit that referenced this issue Sep 1, 2023

Polishing #2082

9de4b7f

Add author and since, tags. Add Javadoc. Original pull request: #2499

mp911de pushed a commit that referenced this issue Sep 1, 2023

Support TCP_USER_TIMEOUT option #2082

3fc3b37

Original pull request: #2499

mp911de added a commit that referenced this issue Sep 1, 2023

Polishing #2082

58657be

Add author and since, tags. Add Javadoc. Original pull request: #2499

mp911de linked a pull request Sep 1, 2023 that will close this issue

Support TCP_USER_TIMEOUT option #2499

Closed

4 tasks

mp911de added this to the 6.2.7.RELEASE milestone Sep 1, 2023

mp911de closed this as completed Sep 1, 2023

yunduankaifa mentioned this issue Oct 16, 2023

how to set jedis blockingSocketTimeoutMillis? spring-projects/spring-data-redis#2738

Open

wuase mentioned this issue Nov 16, 2023

Spring boot - Lost of master and first sentinel node it causes READONLY exception #2548

Closed

tishun mentioned this issue Jul 9, 2024

Provide a way to force the driver to reconnect #2909

Open

Add support for disconnect on timeout to recover early from no RST packet failures #2082

Add support for disconnect on timeout to recover early from no RST packet failures #2082

Comments

yangbodong22011 commented Apr 22, 2022

Bug Report

Why KeepAlive doesn't fix this

In what scenario is this question sent?

How to reproduce this issue

How to fix this

How Jedis avoids this problem

Environment

yangbodong22011 commented Apr 25, 2022

yangbodong22011 commented Apr 26, 2022

mp911de commented Apr 26, 2022

yangbodong22011 commented Apr 27, 2022

mp911de commented Apr 27, 2022

yangbodong22011 commented Apr 27, 2022

mp911de commented Apr 27, 2022

yangbodong22011 commented Apr 27, 2022

yangbodong22011 commented May 10, 2022

chuckz1321 commented Jul 1, 2022 • edited Loading

yangbodong22011 commented Jul 1, 2022

shelltea commented Nov 15, 2022

shelltea commented Nov 22, 2022

yunbozhang-msft commented Nov 28, 2022 • edited Loading

yangbodong22011 commented Dec 5, 2022

vongosling commented Dec 15, 2022 • edited Loading

richieyan commented Jan 29, 2023 • edited Loading

yangbodong22011 commented Jan 31, 2023

summary

chuckz1321 commented Apr 12, 2023 • edited Loading

yangbodong22011 commented Apr 12, 2023

chuckz1321 commented Apr 12, 2023 • edited Loading

yangbodong22011 commented Apr 12, 2023

chuckz1321 commented Apr 12, 2023

yangbodong22011 commented Sep 1, 2023 • edited Loading

mp911de commented Sep 1, 2023

yangbodong22011 commented Sep 1, 2023

mp911de commented Sep 1, 2023

yangbodong22011 commented Sep 1, 2023

mp911de commented Sep 4, 2023

yangbodong22011 commented Sep 21, 2023

mp911de commented Oct 16, 2023

2luckydog commented Nov 15, 2023

mp911de commented Nov 15, 2023

vprakashrao commented Nov 22, 2023

mp911de commented Nov 23, 2023

wayn111 commented Dec 1, 2023

2luckydog commented Dec 1, 2023

wayn111 commented Dec 1, 2023

2luckydog commented Dec 1, 2023

2luckydog commented Dec 1, 2023

yangbodong22011 commented Aug 15, 2024 • edited by tishun Loading

Add support for disconnect on timeout to recover early from no `RST` packet failures #2082

Add support for disconnect on timeout to recover early from no `RST` packet failures #2082

chuckz1321 commented Jul 1, 2022 •

edited

Loading

yunbozhang-msft commented Nov 28, 2022 •

edited

Loading

vongosling commented Dec 15, 2022 •

edited

Loading

richieyan commented Jan 29, 2023 •

edited

Loading

chuckz1321 commented Apr 12, 2023 •

edited

Loading

chuckz1321 commented Apr 12, 2023 •

edited

Loading

yangbodong22011 commented Sep 1, 2023 •

edited

Loading

yangbodong22011 commented Aug 15, 2024 •

edited by tishun

Loading