Flaky `test_AllProgress` #6550

gjoseph92 · 2022-06-09T16:39:40Z

This feels related to #6361, and possibly could be fixed by #6504, and/or #6427?

https://github.com/dask/distributed/runs/6750812220?check_suite_focus=true#step:11:1754

_______________________________ test_AllProgress _______________________________
args = (), kwds = {}
@wraps(func)
definner(*args, **kwds):
>       withself._recreate_cm():
../../../miniconda3/envs/dask-distributed/lib/python3.10/contextlib.py:78: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../miniconda3/envs/dask-distributed/lib/python3.10/contextlib.py:142: in __exit__
next(self.gen)
distributed/utils_test.py:1906: in clean
with check_process_leak(check=processes):
../../../miniconda3/envs/dask-distributed/lib/python3.10/contextlib.py:142: in __exit__
next(self.gen)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
check = True, check_timeout = 40, term_timeout = 3
@contextmanager
defcheck_process_leak(
        check: bool = True, check_timeout: float = 40, term_timeout: float = 3
    ):
"""Terminate any currently-running subprocesses at both the beginning and end of this context
    Parameters
    ----------
    check : bool, optional
        If True, raise AssertionError if any processes survive at the exit
    check_timeout: float, optional
        Wait up to these many seconds for subprocesses to terminate before failing
    term_timeout: float, optional
        After sending SIGTERM to a subprocess, wait up to these many seconds before
        sending SIGKILL
    """
        term_or_kill_active_children(timeout=term_timeout)
try:
yield
if check:
                children = wait_active_children(timeout=check_timeout)
>               assertnot children, f"Test leaked subprocesses: {children}"
E               AssertionError: Test leaked subprocesses: [<SpawnProcess name='Dask Worker process (from Nanny)' pid=47619 parent=15412 started daemon>, <SpawnProcess name='Dask Worker process (from Nanny)' pid=47620 parent=15412 started daemon>]
E               assert not [<SpawnProcess name='Dask Worker process (from Nanny)' pid=47619 parent=15412 started daemon>, <SpawnProcess name='Dask Worker process (from Nanny)' pid=47620 parent=15412 started daemon>]
distributed/utils_test.py:1817: AssertionError
----------------------------- Captured stderr call -----------------------------
2022-06-06 06:23:26,122 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:55254
2022-06-06 06:23:26,122 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:55254
2022-06-06 06:23:26,123 - distributed.worker - INFO -          dashboard at:            127.0.0.1:55255
2022-06-06 06:23:26,123 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:55247
2022-06-06 06:23:26,123 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:26,123 - distributed.worker - INFO -               Threads:                          1
2022-06-06 06:23:26,123 - distributed.worker - INFO -                Memory:                  14.00 GiB
2022-06-06 06:23:26,123 - distributed.worker - INFO -       Local Directory: /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmprdmbzfmm/dask-worker-space/worker-u1jzba_9
2022-06-06 06:23:26,124 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:26,212 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:55257
2022-06-06 06:23:26,212 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:55257
2022-06-06 06:23:26,212 - distributed.worker - INFO -          dashboard at:            127.0.0.1:55258
2022-06-06 06:23:26,213 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:55247
2022-06-06 06:23:26,213 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:26,213 - distributed.worker - INFO -               Threads:                          2
2022-06-06 06:23:26,213 - distributed.worker - INFO -                Memory:                  14.00 GiB
2022-06-06 06:23:26,213 - distributed.worker - INFO -       Local Directory: /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmprdmbzfmm/dask-worker-space/worker-c8g6zm4s
2022-06-06 06:23:26,214 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:27,131 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:55247
2022-06-06 06:23:27,132 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:27,133 - distributed.core - INFO - Starting established connection
2022-06-06 06:23:27,157 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:55247
2022-06-06 06:23:27,157 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:27,158 - distributed.core - INFO - Starting established connection
2022-06-06 06:23:27,787 - distributed.worker - WARNING - Compute Failed
Key:       div-beaac0206246b34d3625d21194e03c13
Function:  div
args:      (1, 0)
kwargs:    {}
Exception: "ZeroDivisionError('division by zero')"
2022-06-06 06:23:30,162 - distributed.scheduler - WARNING - Received heartbeat from unregistered worker 'tcp://127.0.0.1:55257'.
2022-06-06 06:23:36,051 - distributed.worker - ERROR - Scheduler was unaware of this worker 'tcp://127.0.0.1:55257'. Shutting down.
2022-06-06 06:23:36,056 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:55257
2022-06-06 06:23:36,058 - distributed.scheduler - WARNING - Received heartbeat from unregistered worker 'tcp://127.0.0.1:55254'.
2022-06-06 06:23:36,060 - distributed.worker - ERROR - Scheduler was unaware of this worker 'tcp://127.0.0.1:55254'. Shutting down.
2022-06-06 06:23:36,060 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:55254
2022-06-06 06:23:36,072 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting.  Status: Status.closing
2022-06-06 06:23:36,073 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting.  Status: Status.closing
2022-06-06 06:23:36,077 - distributed.nanny - INFO - Worker closed
2022-06-06 06:23:36,078 - distributed.nanny - INFO - Worker closed
2022-06-06 06:23:37,525 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x142dc7430>>, <Task finished name='Task-50018' coro=<Scheduler.restart() done, defined at /Users/runner/work/distributed/distributed/distributed/utils.py:759> exception=CommClosedError("Exception while trying to call remote method 'restart' before comm was established.")>)
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 226, in read
    frames_nbytes = await stream.read_bytes(fmt_size)
tornado.iostream.StreamClosedError: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 897, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 742, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 242, in read
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 150, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed) rpc.restart local=tcp://127.0.0.1:55271 remote=tcp://127.0.0.1:55248>: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 761, in wrapper
    return await func(*args, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/scheduler.py", line 5130, in restart
    resps = await asyncio.wait_for(resps, timeout)
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 218, in All
    result = await tasks.next()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 900, in send_recv_from_rpc
    raise type(e)(
distributed.comm.core.CommClosedError: Exception while trying to call remote method 'restart' before comm was established.
2022-06-06 06:23:39,390 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:55276
2022-06-06 06:23:39,390 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:55277
2022-06-06 06:23:39,390 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:55277
2022-06-06 06:23:39,390 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:55276
2022-06-06 06:23:39,390 - distributed.worker - INFO -          dashboard at:            127.0.0.1:55279
2022-06-06 06:23:39,390 - distributed.worker - INFO -          dashboard at:            127.0.0.1:55278
2022-06-06 06:23:39,390 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:55247
2022-06-06 06:23:39,390 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:55247
2022-06-06 06:23:39,390 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:39,391 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:39,391 - distributed.worker - INFO -               Threads:                          1
2022-06-06 06:23:39,391 - distributed.worker - INFO -               Threads:                          2
2022-06-06 06:23:39,391 - distributed.worker - INFO -                Memory:                  14.00 GiB
2022-06-06 06:23:39,391 - distributed.worker - INFO -                Memory:                  14.00 GiB
2022-06-06 06:23:39,391 - distributed.worker - INFO -       Local Directory: /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmprdmbzfmm/dask-worker-space/worker-x9y623jx
2022-06-06 06:23:39,391 - distributed.worker - INFO -       Local Directory: /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmprdmbzfmm/dask-worker-space/worker-9hrdtqpi
2022-06-06 06:23:39,391 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:39,391 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:40,145 - distributed.client - ERROR - Restart timed out after 10.00 seconds
2022-06-06 06:23:40,753 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:55247
2022-06-06 06:23:40,753 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:40,754 - distributed.core - INFO - Starting established connection
2022-06-06 06:23:40,754 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:55247
2022-06-06 06:23:40,754 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:40,755 - distributed.core - INFO - Starting established connection
2022-06-06 06:23:41,289 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:55276
2022-06-06 06:23:41,290 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting.  Status: Status.closing
2022-06-06 06:23:41,290 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:55277
2022-06-06 06:23:41,291 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting.  Status: Status.closing
2022-06-06 06:23:41,291 - distributed.batched - INFO - Batched Comm Closed <TCP (closed) Worker->Scheduler local=tcp://127.0.0.1:55280 remote=tcp://127.0.0.1:55247>
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/batched.py", line 94, in _background_send
    nbytes = yield self.comm.write(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 269, in write
    raise CommClosedError()
distributed.comm.core.CommClosedError
2022-06-06 06:23:41,292 - distributed.batched - INFO - Batched Comm Closed <TCP (closed) Worker->Scheduler local=tcp://127.0.0.1:55281 remote=tcp://127.0.0.1:55247>
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/batched.py", line 94, in _background_send
    nbytes = yield self.comm.write(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 269, in write
    raise CommClosedError()
distributed.comm.core.CommClosedError
Timed out trying to connect to tcp://127.0.0.1:55249 after 5 s
ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 289, in connect
    comm = await asyncio.wait_for(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 451, in connect
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <distributed.comm.tcp.TCPConnector object at 0x138321150>: ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 761, in wrapper
    return await func(*args, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/worker.py", line 1529, in close
    await r.close_gracefully()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 975, in send_recv_from_rpc
    comm = await self.pool.connect(self.addr)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1196, in connect
    return await connect_attempt
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1132, in _connect
    comm = await connect(
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 315, in connect
    raise OSError(
OSError: Timed out trying to connect to tcp://127.0.0.1:55249 after 5 s
2022-06-06 06:23:46,527 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x10e665180>>, <Task finished name='Task-12' coro=<Worker.close() done, defined at /Users/runner/work/distributed/distributed/distributed/utils.py:759> exception=OSError('Timed out trying to connect to tcp://127.0.0.1:55249 after 5 s')>)
ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 289, in connect
    comm = await asyncio.wait_for(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 451, in connect
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <distributed.comm.tcp.TCPConnector object at 0x138321150>: ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 761, in wrapper
    return await func(*args, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/worker.py", line 1529, in close
    await r.close_gracefully()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 975, in send_recv_from_rpc
    comm = await self.pool.connect(self.addr)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1196, in connect
    return await connect_attempt
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1132, in _connect
    comm = await connect(
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 315, in connect
    raise OSError(
OSError: Timed out trying to connect to tcp://127.0.0.1:55249 after 5 s
Timed out trying to connect to tcp://127.0.0.1:55248 after 5 s
ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 289, in connect
    comm = await asyncio.wait_for(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 451, in connect
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <distributed.comm.tcp.TCPConnector object at 0x12a0f0070>: ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 761, in wrapper
    return await func(*args, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/worker.py", line 1529, in close
    await r.close_gracefully()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 975, in send_recv_from_rpc
    comm = await self.pool.connect(self.addr)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1196, in connect
    return await connect_attempt
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1132, in _connect
    comm = await connect(
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 315, in connect
    raise OSError(
OSError: Timed out trying to connect to tcp://127.0.0.1:55248 after 5 s
[2022](https://github.com/dask/distributed/runs/6750812220?check_suite_focus=true#step:11:2023)-06-06 06:23:46,712 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x12a049240>>, <Task finished name='Task-12' coro=<Worker.close() done, defined at /Users/runner/work/distributed/distributed/distributed/utils.py:759> exception=OSError('Timed out trying to connect to tcp://127.0.0.1:55248 after 5 s')>)
ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 289, in connect
    comm = await asyncio.wait_for(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 451, in connect
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <distributed.comm.tcp.TCPConnector object at 0x12a0f0070>: ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 761, in wrapper
    return await func(*args, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/worker.py", line 1529, in close
    await r.close_gracefully()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 975, in send_recv_from_rpc
    comm = await self.pool.connect(self.addr)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1196, in connect
    return await connect_attempt
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1132, in _connect
    comm = await connect(
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 315, in connect
    raise OSError(
OSError: Timed out trying to connect to tcp://127.0.0.1:55248 after 5 s
2022-06-06 06:23:46,812 - distributed.worker - INFO - Timed out while trying to connect during heartbeat
2022-06-06 06:23:46,999 - distributed.worker - INFO - Timed out while trying to connect during heartbeat
------------------------------ Captured log call -------------------------------
ERROR    asyncio:base_events.py:1744 Future exception was never retrieved
future: <Future finished exception=CommClosedError("Exception while trying to call remote method 'restart' before comm was established.")>
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 226, in read
    frames_nbytes = await stream.read_bytes(fmt_size)
tornado.iostream.StreamClosedError: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 897, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 742, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 242, in read
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 150, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed) rpc.restart local=tcp://127.0.0.1:55270 remote=tcp://127.0.0.1:55249>: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/gen.py", line 769, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 231, in quiet
    yield task
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 900, in send_recv_from_rpc
    raise type(e)(
distributed.comm.core.CommClosedError: Exception while trying to call remote method 'restart' before comm was established.

The text was updated successfully, but these errors were encountered:

gjoseph92 added the flaky test Intermittent failures on CI. label Jun 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky `test_AllProgress` #6550

Flaky `test_AllProgress` #6550

gjoseph92 commented Jun 9, 2022

Flaky test_AllProgress #6550

Flaky test_AllProgress #6550

Comments

gjoseph92 commented Jun 9, 2022

Flaky `test_AllProgress` #6550

Flaky `test_AllProgress` #6550