Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket broker disconnects when client enable heartbeats #27503

Closed
rd-o opened this issue Jan 2, 2020 · 25 comments
Closed

Websocket broker disconnects when client enable heartbeats #27503

rd-o opened this issue Jan 2, 2020 · 25 comments
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket)

Comments

@rd-o
Copy link

rd-o commented Jan 2, 2020

After a spring boot websocket client connects to the websocket spring boot server, the server send some heartbeats, and then after some minutes in the trace logs shows "Connection to broker closed.". Removing setTaskScheduler() from the client seems to work without problems.

Websocket client with SockJsClient and heartbeats enabled.
Websocket server with enableStompBrokerRelay() (ActiveMQ) and withSockJS()

Here a POC with the client and server, it needs a ActiveMQ broker:
https://github.com/rd-o/websocket-spring-boot-bug

@rd-o
Copy link
Author

rd-o commented Jan 2, 2020

2020-01-02 17:07:49.992 DEBUG 3329 --- [nboundChannel-6] org.springframework.web.SimpLogging      : Cleaning up connection state for session 81b3512603de443c92c3d65c63756df7
2020-01-02 17:07:49.992 TRACE 3329 --- [tboundChannel-8] org.springframework.web.SimpLogging      : Encoding STOMP ERROR, headers={message=[failed to forward heart-beat in session 81b3512603de443c92c3d65c63756df7]}
2020-01-02 17:07:49.993 DEBUG 3329 --- [nboundChannel-6] org.springframework.web.SimpLogging      : Closing TCP connection in session 81b3512603de443c92c3d65c63756df7
2020-01-02 17:07:49.993 DEBUG 3329 --- [ent-scheduler-2] org.springframework.web.SimpLogging      : Cleaning up connection state for session 81b3512603de443c92c3d65c63756df7
2020-01-02 17:07:49.994 TRACE 3329 --- [tboundChannel-8] s.w.s.s.t.s.WebSocketServerSockJsSession : Preparing to write SockJsFrame content='a["ERROR\nmessage:failed to forward heart-beat in session 81b3512603de443c92c3d6...(truncated)'
2020-01-02 17:07:49.994 TRACE 3329 --- [tboundChannel-8] s.w.s.s.t.s.WebSocketServerSockJsSession : Writing SockJsFrame content='a["ERROR\nmessage:failed to forward heart-beat in session 81b3512603de443c92c3d6...(truncated)'
2020-01-02 17:07:49.994 TRACE 3329 --- [tboundChannel-9] org.springframework.web.SimpLogging      : Encoding STOMP ERROR, headers={message=[Connection to broker closed.]}
2020-01-02 17:07:49.994 TRACE 3329 --- [tboundChannel-8] o.s.w.s.adapter.NativeWebSocketSession   : Sending TextMessage payload=[a["ERROR\n..], byteCount=120, last=true], StandardWebSocketSession[id=40594c37-39f4-75e0-9927-9216996c87d3, uri=ws://localhost:4321/gs-guide-websocket/236/81b3512603de443c92c3d65c63756df7/websocket]
2020-01-02 17:07:49.995 TRACE 3329 --- [tboundChannel-9] .s.h.ConcurrentWebSocketSessionDecorator : Another send already in progress: session id '81b3512603de443c92c3d65c63756df7':, "in-progress" send time 2 (ms), buffer size 62 bytes
2020-01-02 17:07:49.995 TRACE 3329 --- [tboundChannel-8] s.w.s.s.t.s.WebSocketServerSockJsSession : Preparing to write SockJsFrame content='a["ERROR\nmessage:Connection to broker closed.\ncontent-length:0\n\n\u0000"]'
2020-01-02 17:07:49.996 DEBUG 3329 --- [tboundChannel-9] s.w.s.s.t.s.WebSocketServerSockJsSession : Closing SockJS session 81b3512603de443c92c3d65c63756df7 with CloseStatus[code=1002, reason=null]

@wilkinsona
Copy link
Member

This may be a symptom of AMQ-4710. @rd-o do you see the same problem with a different broker?

@spring-projects-issues
Copy link
Collaborator

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-projects-issues
Copy link
Collaborator

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

@kandonorbert
Copy link

We are facing the same issue after updating the Spring Boot version from "1.5.12.RELEASE" to "2.1.9.RELEASE". ActiveMQ version is 5.15.3.
Randomly we get exception in the logs which states that the connection was lost (code 1002). The ActiveMQ is up and running, no error in the logs. Looks like only the SpringBoot application server restart fixes the connection, the connection is not reestablished without server restart.
We had no such problems in the past 2.5 years with the old SpringBoot version, and with the same ActiveMQ instance.
Now we are planning to go live with the new Spring Boot, however this issue looks like a very risky one.
Any feedback would be appreciated.

@wilkinsona
Copy link
Member

1002 indicates that the connection was closed due to a protocol error. Without some more information, the code alone isn't enough to confirm that it's the same issue as was originally reported here. If you can capture the problem with debug or trace logging enabled, that may help to determine the nature of the error. If you manage to do that or you can provide a minimal sample that reproduces the problem, please open a new issue and we can investigate.

@lrozenblyum
Copy link

@kandonorbert, @rd-o were you able to localize the problem/apply some solution for it?
It looks like we're affected by a similar random 1002 issue after upgrading

Spring WebSocket: 4.3.17.Release -> 5.3.3

@rd-o
Copy link
Author

rd-o commented May 28, 2021

@lrozenblyum

I couldn't find a solution for this

@jbakoc1
Copy link

jbakoc1 commented Oct 1, 2021

@rd-o I have the same issue and I believe it is related to the Page Visibility functionality https://developer.mozilla.org/en-US/docs/Web/API/Page_Visibility_API
After some time the browser suspends the tab and the heartbeat is not processed.
But I don't know why we are getting 1002, instead of 1001.
@wilkinsona can we reopen the issue and provide you with all the info you need?

@wilkinsona
Copy link
Member

wilkinsona commented Oct 1, 2021

But I don't know why we are getting 1002, instead of 1001

This sort of protocol level problem is unlikely to be caused by Spring Boot as it's only involved in configuring some of the components and not in their inter-communication. If your client is receiving a 1002, it's likely to be coming from the STOMP broker. Which broker are you using?

@wilkinsona wilkinsona reopened this Oct 1, 2021
@jbakoc1
Copy link

jbakoc1 commented Oct 1, 2021

RabbitMQ 3.8.16 with STOMP plugin

@jbakoc1
Copy link

jbakoc1 commented Oct 1, 2021

But I don't know why we are getting 1002, instead of 1001

This sort of protocol level problem is unlikely to be caused by Spring Boot as it's only involved in configuring some of the components and not in their inter-communication. If your client is receiving a 1002, it's likely to be coming from the STOMP broker. Which broker are you using?

What kind of configurations you mean. There are no firewalls involved between spring host & the rabbit broker and the default tcp port 61613 is open between them. Can you point to the right direction?

@wilkinsona
Copy link
Member

I mean the configuration of Spring Framework's STOMP support.

Can you reproduce the problem with debug or trace level logging enabled both in your Boot app and in the Rabbit broker? It's not yet clear to me why you think this is a Spring Boot problem rather than a problem in one of the other components that's involved. Seeing some logs that capture the problem occurring should help to narrow down the cause.

@jbakoc1
Copy link

jbakoc1 commented Oct 1, 2021

@wilkinsona here is the log from spring boot. I am trying to reach the rabbit admin to get the logs from rabbit...
spring-trace-logs.txt

@wilkinsona
Copy link
Member

Thanks for the logs.

It looks like the app has received a DISCONNECT from the client:

2021-10-01 11:41:08,042 TRACE org.springframework.web.socket.messaging.StompSubProtocolHandler [https-openssl-apr-443-exec-2] From client: DISCONNECT receipt=close-2 session=5d1d057a-c5b2-2105-a087-098804f469d7

It forwards this to the broker:

2021-10-01 11:41:08,043 DEBUG org.springframework.core.log.CompositeLog [clientInboundChannel-8] Forwarding DISCONNECT receipt=close-2 session=5d1d057a-c5b2-2105-a087-098804f469d7

The connection to the broker is then closed:

2021-10-01 11:41:08,047 DEBUG org.springframework.core.log.CompositeLog [tcp-client-scheduler-1] TCP connection to broker closed in session 5d1d057a-c5b2-2105-a087-098804f469d7

And this results in an error being sent to the client:

2021-10-01 11:41:08,049 TRACE org.springframework.core.log.CompositeLog [clientOutboundChannel-7] Encoding STOMP ERROR, headers={message=[Connection to broker closed.]}

It looks to me like something is going wrong in the handling of the DISCONNECT. Either the connection is being closed when it should not have been, or the relay isn't coping as it should with the connection being closed. This code is part of Spring Framework so we'll transfer this issue so that the Framework team can take a look.

@snicoll snicoll transferred this issue from spring-projects/spring-boot Oct 1, 2021
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Oct 1, 2021
@rd-o
Copy link
Author

rd-o commented Oct 1, 2021

@rd-o I have the same issue and I believe it is related to the Page Visibility functionality https://developer.mozilla.org/en-US/docs/Web/API/Page_Visibility_API After some time the browser suspends the tab and the heartbeat is not processed. But I don't know why we are getting 1002, instead of 1001.

@jbakoc1 In my case the client and server are using Spring Boot

@jbakoc1
Copy link

jbakoc1 commented Oct 8, 2021

Hey @snicoll do we have any update on the issue?

@rstoyanchev rstoyanchev added the in: web Issues in web modules (web, webmvc, webflux, websocket) label Nov 10, 2021
@itunic
Copy link

itunic commented Nov 9, 2022

hello ,Is there a solution to this problem?

@mohammed-eladly-afaqy
Copy link

Hello,
is there any update for this issue?

@fabioportieri
Copy link

same problem, with spring boot 2.7.3 and rabbitmq 3.11, angular as client
is there a workaround? maybe implementing a custom heartbeats mechanism?

@fabioportieri
Copy link

fabioportieri commented Apr 5, 2023

i think i may have found the reason, at least in my case, given this log:

2023-04-05T16:50:36.608+02:00 TRACE 11760 --- [boundChannel-99] o.s.w.s.adapter.NativeWebSocketSession   : Sending TextMessage payload=[a["\n"]], byteCount=7, last=true], StandardWebSocketSession[id=cb107f0a-8291-a48b-62fe-0daf796a5f1b, uri=/websocket/notificationFE/604/ajdkf4vv/websocket] 
2023-04-05T16:50:41.577+02:00 DEBUG 11760 --- [ient-loop-nio-7] r.n.resources.PooledConnectionProvider   : [82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673] Channel closed, now: 1 active connections, 0 inactive connections and 0 pending acquire requests. 
2023-04-05T16:50:41.579+02:00 DEBUG 11760 --- [ient-loop-nio-7] reactor.netty.ReactorNetty               : [82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673] Non Removed handler: StompMessageDecoder, context: ChannelHandlerContext(StompMessageDecoder, [id: 0x82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673]), pipeline: DefaultChannelPipeline{(reactor.left.onChannelReadIdle = reactor.netty.ReactorNetty$InboundIdleStateHandler), (StompMessageDecoder = org.springframework.messaging.tcp.reactor.ReactorNettyTcpClient$StompMessageDecoder), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)} 
2023-04-05T16:50:41.579+02:00 DEBUG 11760 --- [ient-loop-nio-7] reactor.netty.ReactorNetty               : [82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673] Non Removed handler: reactor.left.onChannelReadIdle, context: ChannelHandlerContext(reactor.left.onChannelReadIdle, [id: 0x82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673]), pipeline: DefaultChannelPipeline{(reactor.left.onChannelReadIdle = reactor.netty.ReactorNetty$InboundIdleStateHandler), (StompMessageDecoder = org.springframework.messaging.tcp.reactor.ReactorNettyTcpClient$StompMessageDecoder), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)} 
2023-04-05T16:50:41.580+02:00 DEBUG 11760 --- [ient-loop-nio-7] r.n.r.DefaultPooledConnectionProvider    : [82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673] onStateChange(ChannelOperations{PooledConnection{channel=[id: 0x82f5cc3f, L:/127.0.0.1:64122 ! R:localhost/127.0.0.1:15673]}}, [disconnecting]) 
2023-04-05T16:50:41.581+02:00 TRACE 11760 --- [oundChannel-100] s.w.s.s.t.s.WebSocketServerSockJsSession : Preparing to write SockJsFrame content='a["ERROR\nmessage:Connection to broker closed.\ncontent-length:0\n\n\u0000"]' 
2023-04-05T16:50:41.581+02:00 TRACE 11760 --- [oundChannel-100] s.w.s.s.t.s.WebSocketServerSockJsSession : Writing SockJsFrame content='a["ERROR\nmessage:Connection to broker closed.\ncontent-length:0\n\n\u0000"]' 
2023-04-05T16:50:41.581+02:00 TRACE 11760 --- [oundChannel-100] o.s.w.s.adapter.NativeWebSocketSession   : Sending TextMessage payload=[a["ERROR\n..], byteCount=76, last=true], StandardWebSocketSession[id=cb107f0a-8291-a48b-62fe-0daf796a5f1b, uri=/websocket/notificationFE/604/ajdkf4vv/websocket] 
2023-04-05T16:50:41.582+02:00 DEBUG 11760 --- [oundChannel-100] s.w.s.s.t.s.WebSocketServerSockJsSession : Closing SockJS session ajdkf4vv with CloseStatus[code=1002, reason=null] 
2023-04-05T16:50:41.582+02:00 TRACE 11760 --- [oundChannel-100] s.w.s.s.t.s.WebSocketServerSockJsSession : Writing SockJsFrame content='c[1002,""]' 
2023-04-05T16:50:41.582+02:00 TRACE 11760 --- [oundChannel-100] o.s.w.s.adapter.NativeWebSocketSession   : Sending TextMessage payload=[c[1002,""]], byteCount=10, last=true], StandardWebSocketSession[id=cb107f0a-8291-a48b-62fe-0daf796a5f1b, uri=/websocket/notificationFE/604/ajdkf4vv/websocket] 
2023-04-05T16:50:41.582+02:00 DEBUG 11760 --- [oundChannel-100] o.s.w.s.adapter.NativeWebSocketSession   : Closing StandardWebSocketSession[id=cb107f0a-8291-a48b-62fe-0daf796a5f1b, uri=/websocket/notificationFE/604/ajdkf4vv/websocket] 
2023-04-05T16:50:41.585+02:00 DEBUG 11760 --- [oundChannel-100] s.w.s.h.LoggingWebSocketHandlerDecorator : WebSocketServerSockJsSession[id=ajdkf4vv] closed with CloseStatus[code=1002, reason=null] 
2023-04-05T16:50:41.585+02:00 DEBUG 11760 --- [oundChannel-100] o.s.w.s.m.SubProtocolWebSocketHandler    : Clearing session ajdkf4vv 
2023-04-05T16:50:41.586+02:00 DEBUG 11760 --- [oundChannel-100] i.d.m.w.w.WebsocketEventsListener        : Websocket session disconnected: ajdkf4vv 

it appears the culprit here is netty tcp client, most likely there is a timeout setting for idle connections, that in turns close the websocket, more information here:
reactor/reactor-netty#612

EDIT
i tried patching my ReactorNettyTcpClient, setting this pool:

    this.poolResources = ConnectionProvider.builder("tcp-client-pool").maxConnections(10000)
            .maxIdleTime(Duration.ofSeconds(Integer.MAX_VALUE))
            .disposeTimeout(Duration.ofSeconds(Integer.MAX_VALUE)).disposeInactivePoolsInBackground(Duration.ofSeconds(Integer.MAX_VALUE), Duration.ofSeconds(Integer.MAX_VALUE))
            .maxLifeTime(Duration.ofSeconds(Integer.MAX_VALUE)).build();

but it didn't have any effect, aniway the trace log is this:

2023-04-05T17:53:50.771+02:00 DEBUG 18280 --- [ient-loop-nio-3] r.n.resources.PooledConnectionProvider   : [b874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673] Channel closed, now: 1 active connections, 0 inactive connections and 0 pending acquire requests. 
2023-04-05T17:53:50.772+02:00 DEBUG 18280 --- [ient-loop-nio-3] reactor.netty.ReactorNetty               : [b874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673] Non Removed handler: StompMessageDecoder, context: ChannelHandlerContext(StompMessageDecoder, [id: 0xb874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673]), pipeline: DefaultChannelPipeline{(reactor.left.onChannelReadIdle = reactor.netty.ReactorNetty$InboundIdleStateHandler), (StompMessageDecoder = org.springframework.messaging.tcp.reactor.ReactorNettyTcpClient$StompMessageDecoder), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)} 
2023-04-05T17:53:50.772+02:00 DEBUG 18280 --- [ient-loop-nio-3] reactor.netty.ReactorNetty               : [b874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673] Non Removed handler: reactor.left.onChannelReadIdle, context: ChannelHandlerContext(reactor.left.onChannelReadIdle, [id: 0xb874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673]), pipeline: DefaultChannelPipeline{(reactor.left.onChannelReadIdle = reactor.netty.ReactorNetty$InboundIdleStateHandler), (StompMessageDecoder = org.springframework.messaging.tcp.reactor.ReactorNettyTcpClient$StompMessageDecoder), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)} 
2023-04-05T17:53:50.775+02:00 TRACE 18280 --- [ient-loop-nio-3] reactor.netty.channel.ChannelOperations  : [b874ae11, L:/127.0.0.1:57159 ! R:localhost/127.0.0.1:15673] Disposing ChannelOperation from a channel 

java.lang.Exception: ChannelOperation terminal stack
	at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:465)
	at reactor.netty.channel.ChannelOperations.onInboundClose(ChannelOperations.java:434)
	at reactor.netty.channel.ChannelOperationsHandler.channelInactive(ChannelOperationsHandler.java:73)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392)
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:357)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81)
	at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
	at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:834)

currently trying to figure out why the channel get disposed

@fabioportieri
Copy link

i solved my issue, actually there was not need to mess with netty/reactor or spring, since the problem was in my rabbitmq broker, the problem went away once i configured it with

tcp_listen_options.keepalive = true

@snicoll
Copy link
Member

snicoll commented Dec 12, 2023

@rd-o and those affected, can you please check if configuring the broker to keep the connection alive fixes the issue for you as well?

@snicoll snicoll added the status: waiting-for-feedback We need additional information before we can continue label Dec 12, 2023
@spring-projects-issues
Copy link
Collaborator

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-projects-issues spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Dec 19, 2023
@spring-projects-issues
Copy link
Collaborator

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

@spring-projects-issues spring-projects-issues closed this as not planned Won't fix, can't repro, duplicate, stale Dec 26, 2023
@spring-projects-issues spring-projects-issues removed status: waiting-for-feedback We need additional information before we can continue status: feedback-reminder We've sent a reminder that we need additional information before we can continue status: waiting-for-triage An issue we've not yet triaged or decided on labels Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket)
Projects
None yet
Development

No branches or pull requests