-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
version 1.25.3 in docker env stuck in "Waiting to connect to: ..." #2519
Comments
Interesting. I just went through all of those changes (thanks for the convenient link) and there really wasn't much that was changed in how things connect. I don't immediately see anything that would explain the change in behavior that you're seeing. The next thing I recommend trying would be using |
Ha! Thank you for pointing out git bisect, I haven't used it in at least a year. It's the first commit after 1.25.2, where it no longer connects for me: 70c5129 edit → this points to #2403 in case it's relevant for the CPython thread pooling stuff, I'm using |
ok, after reading #2506 I think we can close my ticket here. I am just confirming it independently. |
I know you're using docker so it might be a little hard to get at, but would you be able to try: upping this number to something like 3, 6, or 10? |
No, that's not a problem, I work interactively inside of it. I changed it to 10 and 100 (based on 1.25.3) and it still doesn't connect. So, just upping the thread pool doesn't work. |
Ok, I played a bit around, and I have no idea if that defeats the purpose of PR #2403 or not, but with that change on top of 1.25.3 it connects again: diff --git a/distributed/comm/tcp.py b/distributed/comm/tcp.py
index 5af63b16..30652cd6 100644
--- a/distributed/comm/tcp.py
+++ b/distributed/comm/tcp.py
@@ -327,16 +327,16 @@ class BaseTCPConnector(Connector, RequireEncryptionMixin):
executor=_executor)
else:
_resolver = None
- client = TCPClient(resolver=_resolver)
@gen.coroutine
def connect(self, address, deserialize=True, **connection_args):
self._check_encryption(address, connection_args)
ip, port = parse_host_port(address)
kwargs = self._get_connect_args(**connection_args)
+ client = TCPClient(resolver=BaseTCPConnector._resolver)
try:
- stream = yield BaseTCPConnector.client.connect(ip, port,
+ stream = yield client.connect(ip, port,
max_buffer_size=MAX_BUFFER_SIZE,
**kwargs) |
So the problem I was trying to solve was that every time you make a The outer loop of this function (what calls this) is a I'm not really sure what the best course of action is. I can't reproduce this locally, so I can't try any other implementations |
one fix seems to be tornado 5.0.0, but i'm gonna try to find a solution that isn't that. |
@mrocklin My PR is incompatible with tornado 4.5.1 When I'm recreating these issues pip/conda is just installing tornado 5+ and it works fine, but if I specify 4.5.1 - 4.5.3 it doesn't work and I can recreate what these people are seeing. Would you like me to make a PR for reverting? |
Maybe we should add that version check in the condition when making the
ThreadPoolExecutor?
…On Sat, Feb 9, 2019 at 12:30 PM Daniel Farrell ***@***.***> wrote:
@mrocklin <https://github.com/mrocklin>
Maybe we should revert my commit until I can figure this out. It seems
like this is going to take more time than I thought.
*My PR is incompatible with tornado 4.5.1*
When I'm recreating these issues pip/conda is just installing tornado 5+
and it works fine, but if I specify 4.5.1 - 4.5.3 it doesn't work and I can
recreate what these people are seeing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2519 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszCajAvN-Ani6eeSu79fLua93dJQEks5vLy_egaJpZM4ayhRq>
.
|
So this is confusing but I think this is the best i can do to recreate this, @mrocklin i made a docker image with distributed before my commit, and tornado 4.5.1 I go into that docker image which is 4.5.1 tornado, and then i use '1.25.1+10.ga0d0ed21' is commit everything appears to work fine, but after ~10-20 seconds heartbeats are broken and I get the trace at the bottom printed to my terminal I discovered this trying to add code that would check for the tornado version like you suggested... basically I started getting this error and I was pretty confused. I can't seem to explain why these projects havent seen errors like this before though.
|
@haraldschilly can you update to Tornado 5.0 and see if things go away? I'm also curious to know if updating to 5.0 a concern for you in some way. |
@mrocklin I already updated to 5.1.1, and that works well on cocalc. here is yet another screenshot, as a follow up to the above. The root issue was that updating ubuntu caused it's python3 tornado distribution package to replace files from an already existing (newer) tornado installation. So, updating again to >= 5 was all I had to do. |
Glad to hear the update fixed things for you. Thanks for looking into this @haraldschilly |
I've a situation, where I'm basically running the scheduler and workers in a docker environment (it's a project at https://cocalc.com ). Version 1.25.2 it works fine, but starting with 1.25.3 I only get
and well, it never does. There are only few changes, though… so maybe this help to narrow down this regression.
scheduler started via
dask-scheduler
and worker viadask-worker tcp://localhost:8786 --local-directory ~/dask-worker --nthreads 1 --nprocs 1 --memory-limit 256M
The text was updated successfully, but these errors were encountered: