Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop keep alives when worker reconnecting to the scheduler #3493

Merged
merged 3 commits into from
Feb 19, 2020

Conversation

jacobtomlinson
Copy link
Member

Related to #3488.

I don't think this fixes the problem, but I think it should address the CommClosedError exceptions which are seen in the log.

When a worker registers with the scheduler it starts sending keepalive messages via a periodic callback. However that never seems to stop if the connection is broken. If the connection hangs for a long period of time the worker still attempts to send the keepalive messages.

This PR moves the definitions of the BatchedSend and keep-alive callback to the __init__ and stops the callback during the reconnect. This is already being done for the heartbeat.

@jacobtomlinson
Copy link
Member Author

I'm not entirely sure what I've broken here. Assistance would be appreciated.

@mrocklin
Copy link
Member

I'll try to take a look later today

@mrocklin
Copy link
Member

If you run tests with -s you'll see errors in the logs that look like the following

>       self.batched_stream = BatchedSend(interval="2ms", loop=self.loop)
E       AttributeError: 'Worker' object has no attribute 'loop'

I've pushed a patch moving the construction a bit lower, but on my machine there are still some failures. It might make sense to put this construction back in `start`` if there isn't a big reason to move it out.

@jacobtomlinson
Copy link
Member Author

Thanks for pushing to this @mrocklin. The CI seems happy here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants