Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spring boot - Lost of master and first sentinel node it causes READONLY exception #2548

Closed
wuase opened this issue Nov 13, 2023 · 3 comments
Closed

Comments

@wuase
Copy link

wuase commented Nov 13, 2023

Lettuce doesn't seems able to recognize topology changes if the first configured sentinel node goes down with the master.
Even configuring a periodic topology refresh seems to be uneffective.

I know from this discussion that looking for master by just one sentinel is a desired behavior, it exists any workaround or procedure to re-establish connection to the correct master?

Update: It seems that the affected connection are the one already established by the pool, new connections works like a charm.

Redis configuration:

  • 3 Redis instances (1 master + 2 readonly replicas)
  • 3 Sentinel instances

Scenario:
Spring boot application is connected to Redis by sentinels through lettuce with pool.

  • The master and the first sentinel node goes down.
  • A new master is elected by remaining sentinels.
  • The master goes up again.
  • The sentinel goes up again.
  • Exceptions are thrown when trying to perform writes.

Spring boot configuration:

spring:
  redis:
    database: 0
    host: sentinel-1
    port: 26379
    timeout: 60000
    sentinel:
      master: "mymaster"
      nodes:
        - "sentinel-1:26379"
        - "sentinel-2:26380"
        - "sentinel-3:26381"
...
    @Bean
    public LettuceClientConfiguration clientConfiguration() {

        GenericObjectPoolConfig<Object> pool = new GenericObjectPoolConfig<>();
        pool.setMaxTotal(100);
        pool.setMaxIdle(100);
        pool.setMinIdle(10);
        pool.setBlockWhenExhausted(true);
        pool.setMaxWait(Duration.of(2, ChronoUnit.SECONDS));

        return LettucePoolingClientConfiguration.builder()
                .poolConfig(pool)
                .readFrom(ReadFrom.LOWEST_LATENCY)
                .clientOptions(ClientOptions.builder()
                        .pingBeforeActivateConnection(true)
                        .autoReconnect(true)
                        .socketOptions(SocketOptions.builder()
                                .keepAlive(true)
                                .build())
                        .cancelCommandsOnReconnectFailure(true)
                        .suspendReconnectOnProtocolFailure(false)
                        .disconnectedBehavior(ClientOptions.DisconnectedBehavior.REJECT_COMMANDS)
                        .build())
                .build();
    }
...

Stacktrace:

 org.springframework.data.redis.RedisSystemException: Error in execution; nested exception is io.lettuce.core.RedisReadOnlyException: READONLY You can't write against a read only slave.
 	at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:54) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:52) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:277) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1085) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.connection.lettuce.LettuceConnection.lambda$doInvoke$4(LettuceConnection.java:938) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.connection.lettuce.LettuceInvoker$Synchronizer.invoke(LettuceInvoker.java:665) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.connection.lettuce.LettuceInvoker.just(LettuceInvoker.java:94) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.connection.lettuce.LettuceKeyCommands.del(LettuceKeyCommands.java:106) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.connection.DefaultedRedisConnection.del(DefaultedRedisConnection.java:95) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at jdk.internal.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[na:na]
 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
 	at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
 	at org.springframework.data.redis.core.CloseSuppressingInvocationHandler.invoke(CloseSuppressingInvocationHandler.java:61) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at jdk.proxy2/jdk.proxy2.$Proxy120.del(Unknown Source) ~[na:na]
 	at org.springframework.data.redis.core.RedisKeyValueAdapter.lambda$put$0(RedisKeyValueAdapter.java:235) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:191) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:178) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.core.RedisKeyValueAdapter.put(RedisKeyValueAdapter.java:230) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.keyvalue.core.KeyValueTemplate.lambda$update$1(KeyValueTemplate.java:221) ~[spring-data-keyvalue-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.keyvalue.core.KeyValueTemplate.execute(KeyValueTemplate.java:362) ~[spring-data-keyvalue-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.keyvalue.core.KeyValueTemplate.update(KeyValueTemplate.java:221) ~[spring-data-keyvalue-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.redis.core.RedisKeyValueTemplate.update(RedisKeyValueTemplate.java:178) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.keyvalue.repository.support.SimpleKeyValueRepository.save(SimpleKeyValueRepository.java:80) ~[spring-data-keyvalue-2.7.17.jar!/:2.7.17]
 	at jdk.internal.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) ~[na:na]
 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
 	at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
 	at org.springframework.data.repository.core.support.RepositoryMethodInvoker$RepositoryFragmentMethodInvoker.lambda$new$0(RepositoryMethodInvoker.java:289) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.repository.core.support.RepositoryMethodInvoker.doInvoke(RepositoryMethodInvoker.java:137) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.repository.core.support.RepositoryMethodInvoker.invoke(RepositoryMethodInvoker.java:121) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.repository.core.support.RepositoryComposition$RepositoryFragments.invoke(RepositoryComposition.java:530) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.repository.core.support.RepositoryComposition.invoke(RepositoryComposition.java:286) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.repository.core.support.RepositoryFactorySupport$ImplementationMethodExecutionInterceptor.invoke(RepositoryFactorySupport.java:640) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
 	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.30.jar!/:5.3.30]
 	at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.doInvoke(QueryExecutorMethodInterceptor.java:164) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
 	at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.invoke(QueryExecutorMethodInterceptor.java:139) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
 	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.30.jar!/:5.3.30]
 	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) ~[spring-aop-5.3.30.jar!/:5.3.30]
 	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.30.jar!/:5.3.30]
 	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:241) ~[spring-aop-5.3.30.jar!/:5.3.30]
 	at jdk.proxy2/jdk.proxy2.$Proxy87.save(Unknown Source) ~[na:na]
 	at com.mycompany.redissentineltest.RedisSentinelTestApplication.writeOnRedis(RedisSentinelTestApplication.java:47) ~[classes!/:0.0.1-SNAPSHOT]
 	at jdk.internal.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) ~[na:na]
 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
 	at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
 	at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) ~[spring-context-5.3.30.jar!/:5.3.30]
 	at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.3.30.jar!/:5.3.30]
 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[na:na]
 	at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[na:na]
 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[na:na]
 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
 	at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
 Caused by: io.lettuce.core.RedisReadOnlyException: READONLY You can't write against a read only slave.
 	at io.lettuce.core.internal.ExceptionFactory.createExecutionException(ExceptionFactory.java:144) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
 	at io.lettuce.core.internal.ExceptionFactory.createExecutionException(ExceptionFactory.java:116) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
 	at io.lettuce.core.protocol.AsyncCommand.completeResult(AsyncCommand.java:120) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
 	at io.lettuce.core.protocol.AsyncCommand.complete(AsyncCommand.java:111) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
 	at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:63) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
 	at io.lettuce.core.protocol.CommandHandler.complete(CommandHandler.java:747) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
 	at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:682) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
 	at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:599) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.100.Final.jar!/:4.1.100.Final]
 	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.100.Final.jar!/:4.1.100.Final]
 	... 1 common frames omitted

Thanks,
Angelo

@wuase
Copy link
Author

wuase commented Nov 16, 2023

The issue related to the pending pool connection seems to be related to L3 tcp timeout problem discussed in this issue,
I'll update and eventually close the issue after some other tests.

@tishun tishun added the triage label Jul 5, 2024
@tishun tishun added for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage and removed triage labels Jul 17, 2024
@wuase wuase closed this as completed Aug 1, 2024
@tishun tishun removed for: team-attention An issue we need to discuss as a team to make progress status: waiting-for-triage labels Aug 1, 2024
@xingzhang8023
Copy link

@wuase I have the same issuse, is there an analysis conclusion of this issuse

@wuase
Copy link
Author

wuase commented Aug 13, 2024

@xingzhang8023 Socket hangup were occurring, kernel fine tuning fixed the issue.

Check out these articles
https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die
https://access.redhat.com/solutions/726753

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants