Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubeCluster hangs if it fails to start dask-scheduler #404

Closed
svetlin-mladenov opened this issue Feb 9, 2022 · 2 comments
Closed

KubeCluster hangs if it fails to start dask-scheduler #404

svetlin-mladenov opened this issue Feb 9, 2022 · 2 comments

Comments

@svetlin-mladenov
Copy link

svetlin-mladenov commented Feb 9, 2022

What happened:
KubeCluster creation just hangs and waits indefinitely.

What you expected to happen:
An error to be reported

Minimal Complete Verifiable Example:

from dask_kubernetes import KubeCluster, make_pod_spec

pod_spec = make_pod_spec(image='ubuntu:latest')
with KubeCluster(pod_spec) as cluster:
    cluster.scale(5)

Anything else we need to know?:
I was trying to start a KubeCluster with an image that was broken and was missing the required dependencies. KubeCluster says Creating scheduler pod on cluster. This may take some time. and just hangs there. Initially I thought it was just pulling the image. After 30 mins I inspected the pod and discovered the issue. The current behavior is just bad user experience. I think an error message should be reported in this case which is going to ease debugging such cases especially for new users.

Environment:

  • Dask version: 2022.01.0
  • Python version: 3.9
  • Operating System: Linux
  • Install method (conda, pip, source): conda
@jacobtomlinson
Copy link
Member

Thanks for raising this @svetlin-mladenov. We are currently putting all our effort into #392 which will ultimately replace KubeCluster, so it is unlikely we will dig into bugs here for now.

If someone else has time to pick this up that would be great.

@jacobtomlinson
Copy link
Member

The classic KubeCluster was removed in #890. All users will need to migrate to the Dask Operator. Closing.

@jacobtomlinson jacobtomlinson closed this as not planned Won't fix, can't repro, duplicate, stale Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants