Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DLT-Multinode: Gateway does not recognize reset of passive node #551 #584

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fivef
Copy link

@fivef fivef commented Dec 14, 2023

Enable TCP keepalive to detect broken connections due to network disconnects. Without this in cases where no TCP FIN or RST package is received from the server, the dlt_client will never notice that it was disconnected from the server. Thus it will not reconnect when the server is available again. Also improve logging output to see which connection fails.

See this #551 and the discussion here: #559

To reproduce this issue fixed here you can run a dlt_daemon as gateway on one machine and a one dlt_daemon as a passive node on another machine or docker container. Then disconnect the network cable or sudo ifconfig eth0 down/up. In the docker case you can disconnect the container from the bridge by running docker network disconnect/connect <docker network name> <container name>

…OVESA#551

Enable TCP keepalive to detect broken connections due to network disconnects.
Without this in cases where no TCP FIN or RST package is received from the server, the dlt_client will never notice that it was disconnected from the server. Thus it will not reconnect when the server is available again.
Also improve logging output to see which connection fails.
@Bichdao021195
Copy link
Contributor

Hello Minh

@minminlittleshrimp
Copy link
Collaborator

Hello @Bichdao021195
Since you now are a DLT maintainer, please kindly take care this PR.
I will support you.
Regards

@minminlittleshrimp
Copy link
Collaborator

Try to config passivenode in container
Cannot disconnect network (tcp6 still there)
Suggest:

  • Create bridge network
  • Try to connect docker -> bridge -> host
  • Build dlt at host -> container (~/work/docker/dlt-daemon)
    FYI @Bichdao021195

@minminlittleshrimp
Copy link
Collaborator

The issue can be reproduced on docker:

On host: ECU1, gateway = 1, timeout = 0 (infinity), ECUid=DOCK, port = 3495
On Container(passive node): DOCK, dlt-daemon -p 3495
Passive node: dlt-example-user -n 1000 Hello
Host: dlt-receive -a localhost

Disconnect: docker network disconnect <bridge> <passive node>
Reconnect: docker network connect <bridge> <passive node>

Host: No action, no aware of the network interface up
dlt-passivenode-ctrl -s: Also not realize, just "DOCK: connected" event the link down

Apply patch, observation: dlt-daemon up and reinitiate the connection, logs can be seen again on dlt-receive side.

What next?
Suggestion: to check dlt-passivenode-ctrl, check if the patch is OK (see both the connect message + not found message, smt wrong here?)
FYI @Bichdao021195
Regards

@Bichdao021195
Copy link
Contributor

Bichdao021195 commented Dec 25, 2023

Hi @minminlittleshrimp
Double Check dlt-passivenode-ctrl with pull request code change that enable TCP keep alive

Prerequisites:
  On host: ECU1, gateway = 1, timeout = 0 (infinity), ECUid=DOCK, port = 3495
  On Container(passive node): DOCK, dlt-daemon -p 3495
  Passive node: dlt-example-user -n 1000 Hello
  Host: dlt-receive -a localhost

Observation : Use dlt-passivenode-ctrl with -s option to check connection status when disconnect/connect network

  Disconnect: docker network disconnect <bridge> <passive node>
        dlt-passivenode-ctrl -s: "DOCK: connected"  will be shown when Gateway Timer to try to connect to passive nodes.
        dlt-passivenode-ctrl -s: "DOCK: disconnected"  will be shown after sometimes when Connection to dlt gateway broken. 
        On Host:
            [14259.941377]~DLT~12609~DEBUG    ~Gateway Timer
            [14260.433034]~DLT~12609~DEBUG    ~Timer timingpacket
            [14260.938933]~DLT~12609~DEBUG    ~Gateway Timer
            [14261.426814]~DLT~12609~DEBUG    ~Timer timingpacket
            [14261.936103]~DLT~12609~DEBUG    ~Gateway Timer
            [14262.431573]~DLT~12609~DEBUG    ~Timer timingpacket
            [14262.936131]~DLT~12609~DEBUG    ~Gateway Timer
            [14263.425694]~DLT~12609~DEBUG    ~Timer timingpacket
            [14263.473617]~DLT~12609~DEBUG    ~Connection to dlt gateway broken.
            [14263.473650]~DLT~12609~WARNING  ~Connection to passive node lost
            [14263.473652]~DLT~12609~WARNING  ~Try to reconnect.
            [14263.473656]~DLT~12609~WARNING  ~Connection to passive node lost
            [14263.473-657]~DLT~12609~INFO     ~Deactivate connection type: 11
            [14264.438378]~DLT~12609~ERROR    ~dlt_client_connect: ERROR: failed to connect to 192.168.2.3:3495! Operation now in progress
  
  Reconnect: docker network connect <bridge> <passive node>
        dlt-passivenode-ctrl -s: "DOCK: connected" will be shown immediately. 


@minminlittleshrimp minminlittleshrimp marked this pull request as ready for review December 25, 2023 13:37
@fivef
Copy link
Author

fivef commented Mar 7, 2024

Is there any progress with the review? Or do you need some support?

@minminlittleshrimp
Copy link
Collaborator

Hello @fivef
I am okay with your patch, let see Michael's point of view.

@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented May 8, 2024

Hello @michael-methner
Kindly review this PR.
For this PR, team already checked and it works fine.
There is another solution from: #581
We have not yet tested PR581.
Kindly consider both.
Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DLT-Multinode: Gateway does not recognize reset of passive node
3 participants