Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding TCP Keep Alive to guarantee master-slave communication #740

Closed
wants to merge 1 commit into from
Closed

Adding TCP Keep Alive to guarantee master-slave communication #740

wants to merge 1 commit into from

Conversation

AnotherDevBoy
Copy link
Contributor

@AnotherDevBoy AnotherDevBoy commented Mar 1, 2018

I am running distributed Locust behind a firewall that closes idle connections.

When I run tests for a long time, eventually the firewall detects that master is not sending new commands to slaves and decides to close the connection on that port.

Unfortunately, that makes me effectively lose control of the slaves and the only way to be able to gain control back is by restarting (not ideal)

I have solved this by adding TCP Keep Alive to ZMQ communications.

I recognized that it might sound a bit too specific for my case (specially if you look at the idle time value that I used). However, it is also true that Locust should do its best on preventing communication between Master and Slave from breaking.

Please share your thoughts on this!

@codecov
Copy link

codecov bot commented Mar 1, 2018

Codecov Report

Merging #740 into master will decrease coverage by 0.19%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #740     +/-   ##
=========================================
- Coverage   65.93%   65.74%   -0.2%     
=========================================
  Files          14       14             
  Lines        1374     1378      +4     
  Branches      214      214             
=========================================
  Hits          906      906             
- Misses        421      425      +4     
  Partials       47       47
Impacted Files Coverage Δ
locust/rpc/zmqrpc.py 34.61% <0%> (-6.3%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4bc679c...75e0e5a. Read the comment docs.

@AnotherDevBoy
Copy link
Contributor Author

Here are some packet captures before the change.

before

As you can see, when the connection is closed due to idleness, the next command triggered via the Web UI results on RST packets and the connection remains unusable.

after

Keep Alives are sent regularly and the connection remains functional.

@heyman
Copy link
Member

heyman commented Mar 2, 2018

Sounds good, though I'm not 100% sure if this can have any side-effects? How does it affect Locust running on Windows?

@@ -17,9 +17,13 @@ class Server(BaseSocket):
def __init__(self, host, port):
context = zmq.Context()
self.receiver = context.socket(zmq.PULL)
self.receiver.setsockopt(zmq.TCP_KEEPALIVE, 1)
self.receiver.setsockopt(zmq.TCP_KEEPALIVE_IDLE, 30)
Copy link
Member

@heyman heyman Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set TCP_KEEPALIVE_IDLE explicitly here, or could we remove it and let the OS default be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use a bit of both.

In order for Locust to function properly, you need to enable TCP Keep Alive even if it is disabled as an OS default setting.

However, the Keep Alive Idle time is a different story. In my case, my environment firewall Idle Time value was shorter than OS default, hence I needed to make it explicitly shorter.

Alternatively, we could add another optional command line parameter to set TCP_KEEPALIVE_IDLE when needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! I don't see how it could have any impact on the performance so it's probably fine to set it to 30.

Unless there are cases where turning on keep-alives could have any negative effect, I'd prefer to not expose an additional command line option.

@AnotherDevBoy
Copy link
Contributor Author

AnotherDevBoy commented May 9, 2018

Sorry for the wait!

I haven't seen any anomalies when running it on Windows.

Is there any other blocker for this PR to be merged @heyman?

@AnotherDevBoy
Copy link
Contributor Author

keepalive

Here is a packet capture screenshot of TCP Keep Alive working on Windows (running localhost)

@Jonnymcc
Copy link
Contributor

@albertowar I think that a recent PR I created would also solve the issue you were tackling without having to enable TCP keep-alive #927. If you could test the changes in that PR and if they solve the issue maybe we can close this PR in favor of the newer one.

According to the ZMQ guide using only one socket and sending heartbeats on it should be sufficient.

"Do heartbeating on the same socket you use for messages, so your heartbeats also act as a keep-alive to stop the network connection from going stale (some firewalls can be unkind to silent connections)."

@AnotherDevBoy
Copy link
Contributor Author

Hey @Jonnymcc, I will this week and see if it does the trick. If it works, I am happy to take whatever gets merged first.

I have been waiting for a while for this so my only concern is that the maintainers might take even longer to review your PR (given that it contains more changes).

I will keep an eye on it!

@AnotherDevBoy
Copy link
Contributor Author

Closing in favor of: #927

Cheers @Jonnymcc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants