Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add idle cluster cleanup #667

Closed
jacobtomlinson opened this issue Mar 8, 2023 · 5 comments · Fixed by #672
Closed

Add idle cluster cleanup #667

jacobtomlinson opened this issue Mar 8, 2023 · 5 comments · Fixed by #672

Comments

@jacobtomlinson
Copy link
Member

Inspired by dask/dask-gateway#687 I think we should add a self cleanup for idle clusters here.

I expect the implementation would involve having some kind of configurable idle timeout in the DaskCluster resource and if this is set have the controller poll the scheduler via a timer to find out if it is idle. If it is idle for longer than the timeout the controller would delete the DaskCluster resource.

@haddon-korinek-ent
Copy link

This would be a massive quality of live improvement and cost saver. We have hundreds of ephemeral operator controlled dask clusters being spun up every day. The processes which create them can abruptly fail and miss cleanup hooks, especially if initiated on spot instances. There isn't a trivial way for us to automate cleanup for these currently.

@HynekBlahaDev
Copy link

HynekBlahaDev commented Apr 13, 2023

We are also very interested in this feature.
What would be the best way to identify an idle cluster?
Use the API endpoint for call_stack? I noticed there are intervals in which it returns empty responses even though the task is not finished yet. Or maybe check timestamp of last (gather) event?

Update: I noticed there is already open POC #672 and it uses not-yet-documented Scheduler HTTP API /v1/check_idle dask/distributed#7642.

@jacobtomlinson
Copy link
Member Author

We have hundreds of ephemeral operator controlled dask clusters being spun up every day.

That is super exciting! I'd love to chat sometime about your experience with it.

There isn't a trivial way for us to automate cleanup for these currently.

Yeah that's the motivation behind this issue.

I noticed there is already open POC

The POC #672 just needs a little love to push it over the line. Given the interest in this issue I'll definitely bump it up my priority list.

If either of you have feedback on the design of that PR I'd love for you to comment there.

@haddon-korinek-ent
Copy link

That is super exciting! I'd love to chat sometime about your experience with it.

Would be happy to!

@jacobtomlinson
Copy link
Member Author

Would be happy to!

Awesome. Perhaps the Dask Slack is a good place to start this chat? Would you mind signing up and pinging me over there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants