Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install Dask + Distributed from main #546

Merged
merged 2 commits into from
Mar 10, 2021

Conversation

jakirkham
Copy link
Member

These recently dropped the master branch and switched to main. So update the install steps to use main instead.

These recently dropped the `master` branch and switched to `main`. So
update the install steps to use `main` instead.
@jakirkham jakirkham requested a review from a team as a code owner March 8, 2021 20:36
@github-actions github-actions bot added the gpuCI gpuCI issue label Mar 8, 2021
@jakirkham jakirkham added 3 - Ready for Review Ready for review by team non-breaking Non-breaking change bug Something isn't working labels Mar 8, 2021
@jakirkham
Copy link
Member Author

@gpucibot merge

@jakirkham jakirkham added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Mar 8, 2021
@jakirkham
Copy link
Member Author

Seems like we are getting some explicit comms test failures. @rjzamora would you be able to take a look? 🙂

@pentschev
Copy link
Member

The issues seem to be GPU-related in CI:

22:30:30 [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE:
22:30:31 Coverage.py warning: --include is ignored because --source is set (include-ignored)
22:30:31 Coverage.py warning: --include is ignored because --source is set (include-ignored)
22:30:31 Unable to start CUDA Context
22:30:31 Traceback (most recent call last):
22:30:31   File "/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 237, in initialize
22:30:31     self.cuInit(0)
22:30:31   File "/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 300, in safe_cuda_api_call
22:30:31     self._check_error(fname, retcode)
22:30:31   File "/opt/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/cudadrv/driver.py", line 335, in _check_error
22:30:31     raise CudaAPIError(retcode, msg)
22:30:31 numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE

Rerunning to see if this unblocks.

@pentschev
Copy link
Member

rerun tests

@jakirkham
Copy link
Member Author

jakirkham commented Mar 9, 2021

Seeing this in the log

10:58:40   File "/opt/conda/envs/rapids/lib/python3.7/site-packages/ucp/core.py", line 628, in recv
10:58:40     ret = await comm.tag_recv(self._ep, buffer, nbytes, tag, name=log)
10:58:40 ucp.exceptions.UCXMsgTruncated: <[Recv #112] ep: 0x7f0e25cde0d8, tag: 0xfa85496d273cdec2, nbytes: 260, type: <class 'numpy.ndarray'>>: length mismatch: 16 (got) != 260 (expected)

Also seeing this

11:05:40   File "/var/lib/jenkins/workspace/rapidsai/gpuci/dask-cuda/prb/dask-cuda-gpu-test/CUDA/10.1/GPU_LABEL/gpu-t4||gpu/OS/ubuntu16.04/PYTHON/3.7/dask_cuda/explicit_comms/dataframe/shuffle.py", line 196, in local_shuffle
11:05:40     out_parts[i] = None
11:05:40 TypeError: 'tuple' object does not support item assignment
11:05:40 FAILED

@jakirkham
Copy link
Member Author

Wondering if that last part is related to PR ( dask/distributed#4531 )

cc @madsbk (in case you have any thoughts here 🙂)

@pentschev
Copy link
Member

Ah sorry @jakirkham , I missed those errors, but I'm seeing them now too on the latest CI run as well. Seems like a potential issue coming from dask/distributed#4531 indeed, so it would be good to have @madsbk looking into it.

To unblock CI, what do you think about xfailing those tests @jakirkham ?

@jakirkham jakirkham requested a review from a team as a code owner March 9, 2021 20:02
@github-actions github-actions bot added the python python code needed label Mar 9, 2021
@jakirkham
Copy link
Member Author

Sure marked as xfail. Raised issue ( #549 ) to track and included this in the xfail message

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jakirkham !

@pentschev
Copy link
Member

@jakirkham seems like

def test_dask_use_explicit_comms():
should also be xfailed. Could you do that as well?

@jakirkham
Copy link
Member Author

Good catch! Thanks Peter 😄

Sorry had missed that earlier. Should be addressed now 🙂

@rapids-bot rapids-bot bot merged commit 46c24e6 into rapidsai:branch-0.19 Mar 10, 2021
@jakirkham jakirkham deleted the fix_pip_install branch March 10, 2021 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working gpuCI gpuCI issue non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants