-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible Release Today ? #201
Comments
@jrbourbeau is out today. I have no idea who else is capable of doing a release (I am not). If this is broken I suggest a revert regardless of the release. If it's on main we can even do a non-HEAD release with this revert and will not block any other PRs. |
I believe @jakirkham can do a release but will not be around until PST wakes up. @pentschev can you submit a PR to revert dask/distributed#5487 ? If John is able to, @fjetter are you ok with us push a release out ? @martindurant @jsignell should they have comments as well |
I don't expect this to block Distributed's CI, so far I was only able to reproduce this with 4+ GPUs, and gpuCI is limited to single-GPU only today. |
Opened dask/distributed#5505 to revert the PR now. |
I'm not thrilled about this but I won't resist. I'm just on the fence since there are so many critcial stability problems on main lately and we're still going through with releases. UCX stuff affects a smaller subset of people even. However, critical bugfixes are worth a hotfix release and that shouldn't hurt anyone I can't help but wonder if this could've been avoidable if we changed our release process, e.g. do not release anything that hasn't been on main for X days or if there are any external tests we should consult before releasing? |
I agree with this generally. This week however is when RAPIDS is starting its burndown cycle and we are pinning to the latest Dask release. I think this is one where we probably should have tested more on our end before merging in. |
Not release process-wide, but this could have been avoided if we had multi-GPU testing in CI, which is challenge I've been trying to keep up with by running my own "multi-GPU CI" nightly.
Again, this is a problem with the limitation in CI. In a perfect world I would love to have multi-GPU CI available, but unfortunately this isn't a reality today. Thus, I'm trying to keep up with those issues myself, but lately I've been working on fixing so many UCX issues (not all in Dask/Distributed) that this one ended up being stationed in my queue for a few days before getting to debug and realize the root of the problem, apologies for the trouble I caused here. |
Don't worry. I'm just wondering how we can improve for the future. |
Benjamin Zaitlen ***@***.***> writes:
@martindurant @jsignell should they have comments as well
Just chiming in to say @martindurant is out this week
|
Thanks all. I've merged in dask/distributed#5487 and @jakirkham will be starting the release process shortly |
Tagged and uploaded to PyPI. Will work on getting out conda-forge packages now. |
conda-forge packages are up. Went a bit slower than expected as there were some CDN issues. |
Thanks @jakirkham for getting this in. And thank you folks for the quick responses here. I'll close this issue |
Looks like I picked a bad day to be out of the office : ) Thank you to everyone here for handling this |
This morning @pentschev detected a critical bug around UCX and connection handling after dask/distributed#5487 was merged in. How would folks feel if we reverted dask/distributed#5487 and pushed out a release today, before the serialization PRs went in ?
cc @jrbourbeau @fjetter @jakirkham
The text was updated successfully, but these errors were encountered: