DLT-Multinode: Gateway does not recognize reset of passive node #559
-
Precondition:
Action:
Expected behaviour:
Observed behavior:
Traces with Verbose=1 and LoggingLevel=7 are attached: dlt-log.zip DLT Package Version: 2.18.9 STABLE, Package Revision: v2.18.9 |
Beta Was this translation helpful? Give feedback.
Replies: 18 comments 2 replies
-
Might be solved here: #537 I will test it... |
Beta Was this translation helpful? Give feedback.
-
This does not solve the issue. |
Beta Was this translation helpful? Give feedback.
-
Hello @marc-heinlein , |
Beta Was this translation helpful? Give feedback.
-
Hello @michael-methner, the issue is when resetting the passive node. When resetting the gateway node, everything works like expected. |
Beta Was this translation helpful? Give feedback.
-
Hi all, You can try to trigger an establishment from gateway to passive via dlt-passive-node-ctrl. |
Beta Was this translation helpful? Give feedback.
-
Hmm, for me, it sounds strange, that the passive node should actively request at the gateway node to reconnect to passive node. If this is the forseen strategy for the multinode configuration, it should be built into the passive node, that it automatically (re-) requests the connection once the gateway node does not connect until a certain timeout (i. e. if the passive node restarts after a reset). Related to dlt-passive-node-ctrl: How could I call this on passive node side? I can not see any argument, that allows me to specify on which IP address the gateway node is located. Anyhow, this approach sounds to me like "create an external supervisor, for 'gateway-to-passive' connections". Overall, it would be more straight forward for the gateway node to just trigger reconnection once it detects, that the passive node is not there anymore (maybe a cyclic control message could be used for such an alive check). |
Beta Was this translation helpful? Give feedback.
-
Hi dlt-passive-node-ctrl should be performed at gateway node not at passive. Regarding the automatically reconnect attempt from passive, from my point of view I think it is not reasonable. "create an external supervisor, for 'gateway-to-passive' connections". <== I would say yes for this point. |
Beta Was this translation helpful? Give feedback.
-
Why not just letting the gateway node perform a (re-)connect as long as it detects, that it can not communicate with the passive node? This should be done independent of the reason of being disconnected:
|
Beta Was this translation helpful? Give feedback.
-
Close due to being not a bug but specific DLT mechanism for gateway <-> passive node |
Beta Was this translation helpful? Give feedback.
-
Unfortunately this issue has been closed considering to be a desired behaviour, that the gateway might loose the connection to a passive node without triggering any reconnection attempt. This somehow renders the DLT multinode approach for losely coupled nodes completely useless... Adding some external supervisor on the gateway node, that triggers the gateway to reconnect is a bad joke. |
Beta Was this translation helpful? Give feedback.
-
Hi, In general, all features are very welcome and DLT maintainers are also happy and willing to review and Happy coding |
Beta Was this translation helpful? Give feedback.
-
Hello all,
Sorry if there is static IPv6, I am testing IPv6 adaption for gateway.
On Docker:
On Host:
Result:
Same for the reverse way. My conclusion is that no matter how to disable the Gateway, or the Passive node, it will recover.
This is the result so far I check, please kindly check again.
|
Beta Was this translation helpful? Give feedback.
-
Please notice that dlt is up to v2.18.10. |
Beta Was this translation helpful? Give feedback.
-
Transfer to discussion, please notice. |
Beta Was this translation helpful? Give feedback.
-
Gateway reset: still reconnect (new connect in fact) |
Beta Was this translation helpful? Give feedback.
-
I am encountering the same problem that @marc-heinlein has described. I am using dlt-daemon version 2.18.10. The reconnect timeout is set to 0 (infinity). The experiment of @minminlittleshrimp with containers leads to the exact same results on my machine. But if the passive node runs on another machine and this machine is just power cut, the gateway will not even notice the disconnect. It looks to me that it doesn't initiate a reconnect because it never detects that the passive node is disconnected. Output of gateway when docker container with passive node is shut down at red line: (Please ignore all the ERRORS, there are multiple passive nodes in the system which were not running) Output of gateway when passive node is running on a separate machine and the machine is power cut at red line: Does anyone have an idea what could cause this behavior, before I dig into the dlt-daemon code? |
Beta Was this translation helpful? Give feedback.
-
Hello @fivef |
Beta Was this translation helpful? Give feedback.
-
Hey minminlittleshrimp, please see my pull request regarding this issue #584 |
Beta Was this translation helpful? Give feedback.
Gateway reset: still reconnect (new connect in fact)
Passive node reset: still reconnect, but only within the timeout interval
No retry: timeout, Gateway giveup.