Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Release Today ? #201

Closed
quasiben opened this issue Nov 8, 2021 · 15 comments
Closed

Possible Release Today ? #201

quasiben opened this issue Nov 8, 2021 · 15 comments

Comments

@quasiben
Copy link
Member

quasiben commented Nov 8, 2021

This morning @pentschev detected a critical bug around UCX and connection handling after dask/distributed#5487 was merged in. How would folks feel if we reverted dask/distributed#5487 and pushed out a release today, before the serialization PRs went in ?

cc @jrbourbeau @fjetter @jakirkham

@fjetter
Copy link
Member

fjetter commented Nov 8, 2021

@jrbourbeau is out today. I have no idea who else is capable of doing a release (I am not). If this is broken I suggest a revert regardless of the release. If it's on main we can even do a non-HEAD release with this revert and will not block any other PRs.

@quasiben
Copy link
Member Author

quasiben commented Nov 8, 2021

I believe @jakirkham can do a release but will not be around until PST wakes up. @pentschev can you submit a PR to revert dask/distributed#5487 ?

If John is able to, @fjetter are you ok with us push a release out ?

@martindurant @jsignell should they have comments as well

@pentschev
Copy link
Member

I don't expect this to block Distributed's CI, so far I was only able to reproduce this with 4+ GPUs, and gpuCI is limited to single-GPU only today.

@pentschev
Copy link
Member

Opened dask/distributed#5505 to revert the PR now.

@fjetter
Copy link
Member

fjetter commented Nov 8, 2021

If John is able to, @fjetter are you ok with us push a release out ?

I'm not thrilled about this but I won't resist. I'm just on the fence since there are so many critcial stability problems on main lately and we're still going through with releases. UCX stuff affects a smaller subset of people even. However, critical bugfixes are worth a hotfix release and that shouldn't hurt anyone

I can't help but wonder if this could've been avoidable if we changed our release process, e.g. do not release anything that hasn't been on main for X days or if there are any external tests we should consult before releasing?
see also #200

@quasiben
Copy link
Member Author

quasiben commented Nov 8, 2021

I'm not thrilled about this but I won't resist. I'm just on the fence since there are so many critcial stability problems on main lately and we're still going through with releases. UCX stuff affects a smaller subset of people even. However, critical bugfixes are worth a hotfix release and that shouldn't hurt anyone

I agree with this generally. This week however is when RAPIDS is starting its burndown cycle and we are pinning to the latest Dask release. I think this is one where we probably should have tested more on our end before merging in.

@pentschev
Copy link
Member

I can't help but wonder if this could've been avoidable if we changed our release process, e.g. do not release anything that hasn't been on main for X days ...

Not release process-wide, but this could have been avoided if we had multi-GPU testing in CI, which is challenge I've been trying to keep up with by running my own "multi-GPU CI" nightly.

... or if there are any external tests we should consult before releasing?

I think this is one where we probably should have tested more on our end before merging in.

Again, this is a problem with the limitation in CI. In a perfect world I would love to have multi-GPU CI available, but unfortunately this isn't a reality today. Thus, I'm trying to keep up with those issues myself, but lately I've been working on fixing so many UCX issues (not all in Dask/Distributed) that this one ended up being stationed in my queue for a few days before getting to debug and realize the root of the problem, apologies for the trouble I caused here.

@fjetter
Copy link
Member

fjetter commented Nov 8, 2021

apologies for the trouble I caused here.

Don't worry. I'm just wondering how we can improve for the future.

@douglasdavis
Copy link
Member

douglasdavis commented Nov 8, 2021 via email

@jsignell
Copy link
Member

jsignell commented Nov 8, 2021

I'm ok with reverting and releasing. But I agree with @fjetter that once this has been solved, we should follow up about how to fix the release system more holistically (I commented over at #200).

@quasiben
Copy link
Member Author

quasiben commented Nov 8, 2021

Thanks all. I've merged in dask/distributed#5487 and @jakirkham will be starting the release process shortly

@jakirkham
Copy link
Member

Tagged and uploaded to PyPI. Will work on getting out conda-forge packages now.

@jakirkham
Copy link
Member

conda-forge packages are up. Went a bit slower than expected as there were some CDN issues.

@quasiben
Copy link
Member Author

quasiben commented Nov 9, 2021

Thanks @jakirkham for getting this in. And thank you folks for the quick responses here. I'll close this issue

@quasiben quasiben closed this as completed Nov 9, 2021
@jrbourbeau
Copy link
Member

Looks like I picked a bad day to be out of the office : ) Thank you to everyone here for handling this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants