-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network traffic up for 20 minutes after a server restart #135
Comments
indrekj
added a commit
to indrekj/phoenix_pubsub
that referenced
this issue
Nov 28, 2019
Scenario how Node2 is replaced by Node3 (this is also basically a rolling update): 1. Node1 and Node2 are up and synced 2. Kill Node2 (node1 will start permdown grace period for Node2) 3. Spawn Node3 4. Node1 sends a heartbeat that includes clocks for Node1 & Node2 5. Node3 receives the heartbeat. It sees node1 clock is dominating because there's Node2 clock. It requests transfer from Node1. 6. Node1 sends transfer ack to Node3 7. Node3 uses `State#extract` to process the transfer payload which discards Node2 values. 8. It all starts again from step 4 on the next heartbeat. This loop between steps 4 and 8 lasts until Node1 permdown period for Node2 triggers and it doesn't put it to the heartbeat clocks any more. The solution here is not to include down replicas in the heartbeat notifications. This fixes phoenixframework#135
indrekj
added a commit
to indrekj/phoenix_pubsub
that referenced
this issue
Nov 28, 2019
Scenario how Node2 is replaced by Node3 (this is also basically a rolling update): 1. Node1 and Node2 are up and synced 2. Kill Node2 (node1 will start permdown grace period for Node2) 3. Spawn Node3 4. Node1 sends a heartbeat that includes clocks for Node1 & Node2 5. Node3 receives the heartbeat. It sees node1 clock is dominating because there's Node2 clock. It requests transfer from Node1. 6. Node1 sends transfer ack to Node3 7. Node3 uses `State#extract` to process the transfer payload which discards Node2 values. 8. It all starts again from step 4 on the next heartbeat. This loop between steps 4 and 8 lasts until Node1 permdown period for Node2 triggers and it doesn't put it to the heartbeat clocks any more. The solution here is not to include down replicas in the heartbeat notifications. This fixes phoenixframework#135
urmastalimaa
pushed a commit
to indrekj/phoenix_pubsub
that referenced
this issue
Nov 29, 2019
Scenario how Node2 is replaced by Node3 (this is also basically a rolling update): 1. Node1 and Node2 are up and synced 2. Kill Node2 (node1 will start permdown grace period for Node2) 3. Spawn Node3 4. Node1 sends a heartbeat that includes clocks for Node1 & Node2 5. Node3 receives the heartbeat. It sees node1 clock is dominating because there's Node2 clock. It requests transfer from Node1. 6. Node1 sends transfer ack to Node3 7. Node3 uses `State#extract` to process the transfer payload which discards Node2 values. 8. It all starts again from step 4 on the next heartbeat. This loop between steps 4 and 8 lasts until Node1 permdown period for Node2 triggers and it doesn't put it to the heartbeat clocks any more. The solution here is not to include down replicas in the heartbeat notifications. This fixes phoenixframework#135
urmastalimaa
pushed a commit
to indrekj/phoenix_pubsub
that referenced
this issue
Nov 29, 2019
Scenario how Node2 is replaced by Node3 (this is also basically a rolling update): 1. Node1 and Node2 are up and synced 2. Kill Node2 (node1 will start permdown grace period for Node2) 3. Spawn Node3 4. Node1 sends a heartbeat that includes clocks for Node1 & Node2 5. Node3 receives the heartbeat. It sees node1 clock is dominating because there's Node2 clock. It requests transfer from Node1. 6. Node1 sends transfer ack to Node3 7. Node3 uses `State#extract` to process the transfer payload which discards Node2 values. 8. It all starts again from step 4 on the next heartbeat. This loop between steps 4 and 8 lasts until Node1 permdown period for Node2 triggers and it doesn't put it to the heartbeat clocks any more. The solution here is not to include down replicas in the heartbeat notifications. This fixes phoenixframework#135
urmastalimaa
pushed a commit
to indrekj/phoenix_pubsub
that referenced
this issue
Nov 29, 2019
Scenario how Node2 is replaced by Node3 (this is also basically a rolling update): 1. Node1 and Node2 are up and synced 2. Kill Node2 (node1 will start permdown grace period for Node2) 3. Spawn Node3 4. Node1 sends a heartbeat that includes clocks for Node1 & Node2 5. Node3 receives the heartbeat. It sees node1 clock is dominating because there's Node2 clock. It requests transfer from Node1. 6. Node1 sends transfer ack to Node3 7. Node3 uses `State#extract` to process the transfer payload which discards Node2 values. 8. It all starts again from step 4 on the next heartbeat. This loop between steps 4 and 8 lasts until Node1 permdown period for Node2 triggers and it doesn't put it to the heartbeat clocks any more. The solution here is not to include down replicas in the heartbeat notifications. This fixes phoenixframework#135
urmastalimaa
pushed a commit
to indrekj/phoenix_pubsub
that referenced
this issue
Nov 29, 2019
Scenario how Node2 is replaced by Node3 (this is also basically a rolling update): 1. Node1 and Node2 are up and synced 2. Kill Node2 (node1 will start permdown grace period for Node2) 3. Spawn Node3 4. Node1 sends a heartbeat that includes clocks for Node1 & Node2 5. Node3 receives the heartbeat. It sees node1 clock is dominating because there's Node2 clock. It requests transfer from Node1. 6. Node1 sends transfer ack to Node3 7. Node3 uses `State#extract` to process the transfer payload which discards Node2 values. 8. It all starts again from step 4 on the next heartbeat. This loop between steps 4 and 8 lasts until Node1 permdown period for Node2 triggers and it doesn't put it to the heartbeat clocks any more. The solution here is not to include down replicas in the heartbeat notifications. This fixes phoenixframework#135
indrekj
added a commit
to indrekj/phoenix_pubsub
that referenced
this issue
Dec 4, 2019
Scenario how Node2 is replaced by Node3 (this is also basically a rolling update): 1. Node1 and Node2 are up and synced 2. Kill Node2 (node1 will start permdown grace period for Node2) 3. Spawn Node3 4. Node1 sends a heartbeat that includes clocks for Node1 & Node2 5. Node3 receives the heartbeat. It sees node1 clock is dominating because there's Node2 clock. It requests transfer from Node1. 6. Node1 sends transfer ack to Node3 7. Node3 uses `State#extract` to process the transfer payload which discards Node2 values. 8. It all starts again from step 4 on the next heartbeat. This loop between steps 4 and 8 lasts until Node1 permdown period for Node2 triggers and it doesn't put it to the heartbeat clocks any more. The solution here is not to include down replicas in the heartbeat notifications. This fixes phoenixframework#135
indrekj
added a commit
to salemove/phoenix_pubsub
that referenced
this issue
Dec 4, 2019
Scenario how Node2 is replaced by Node3 (this is also basically a rolling update): 1. Node1 and Node2 are up and synced 2. Kill Node2 (node1 will start permdown grace period for Node2) 3. Spawn Node3 4. Node1 sends a heartbeat that includes clocks for Node1 & Node2 5. Node3 receives the heartbeat. It sees node1 clock is dominating because there's Node2 clock. It requests transfer from Node1. 6. Node1 sends transfer ack to Node3 7. Node3 uses `State#extract` to process the transfer payload which discards Node2 values. 8. It all starts again from step 4 on the next heartbeat. This loop between steps 4 and 8 lasts until Node1 permdown period for Node2 triggers and it doesn't put it to the heartbeat clocks any more. The solution here is not to include down replicas in the heartbeat notifications. This fixes phoenixframework#135
chrismccord
pushed a commit
that referenced
this issue
Jan 7, 2020
Scenario how Node2 is replaced by Node3 (this is also basically a rolling update): 1. Node1 and Node2 are up and synced 2. Kill Node2 (node1 will start permdown grace period for Node2) 3. Spawn Node3 4. Node1 sends a heartbeat that includes clocks for Node1 & Node2 5. Node3 receives the heartbeat. It sees node1 clock is dominating because there's Node2 clock. It requests transfer from Node1. 6. Node1 sends transfer ack to Node3 7. Node3 uses `State#extract` to process the transfer payload which discards Node2 values. 8. It all starts again from step 4 on the next heartbeat. This loop between steps 4 and 8 lasts until Node1 permdown period for Node2 triggers and it doesn't put it to the heartbeat clocks any more. The solution here is not to include down replicas in the heartbeat notifications. This fixes #135
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We're using Phoenix Presence (latest version in the master branch). We have a kubernetes set up where we have multiple pods running.
We've noticed that every time we restart a pod or do a rolling update then network traffic is up for 20 minutes.
I was able to replicate it in our beta environment when I had 10K online connections and I restarted one pod:
As you can see, traffic went up around 11:53 and came back down around 12:14.
I think it's related to
permdown_period
setting which by default is 20 minutes. I tried to replicate this with just phoenix_pubsub library without a web server but wasn't able to. EDIT: It is related. If I changed it to 10, then network traffic was up only for 10 minutes.I also did a tcpdump inside one pod to see where the traffic is coming from/going to. It was all between the presence servers themselves. I think these are the state synchronization messages.
Do you have any suggestions what to look for or how to gather more information?
The text was updated successfully, but these errors were encountered: