Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Misleading error code when multi-replica service provisioining fails #1722

Open
jvstme opened this issue Sep 24, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@jvstme
Copy link
Collaborator

jvstme commented Sep 24, 2024

Steps to reproduce

Run a two-replica service, but set requirements that will only match one instance. For example:

  1. Provision a single-instance fleet.
> cat fleets/cloud.dstack.yml
type: fleet
name: cloud
nodes: 1

> dstack apply -f fleets/cloud.dstack.yml -y
  1. Wait until the instance is idle.
  2. Try running a two-replica service using just this one instance.
> cat services/httpbin.dstack.yml 
type: service
name: httpbin
image: kennethreitz/httpbin
port: 80
replicas: 2

> dstack apply -f services/httpbin.dstack.yml --reuse -y

Actual behaviour

Run failed with error code TERMINATED_BY_SERVER.
Check CLI, server, and run logs for more details.

Expected behaviour

Run fails with FAILED_TO_START_DUE_TO_NO_CAPACITY, CLI shows a relevant message.

All provisioning attempts failed. This is likely due to cloud providers not having enough capacity. Check CLI and server logs for more details.

dstack version

master

Server logs

Only relevant logs:

           DEBUG    dstack._internal.server.background.tasks.process_submitted_jobs:98 job(a01b1f)httpbin-0-0: provisioning has started                        
           INFO     dstack._internal.server.background.tasks.process_submitted_jobs:333 The job httpbin-0-0 switched instance cloud-0 status to BUSY           
           INFO     dstack._internal.server.background.tasks.process_submitted_jobs:342 job(a01b1f)httpbin-0-0: now is provisioning on 'cloud-0'               
[21:02:57] DEBUG    dstack._internal.server.background.tasks.process_submitted_jobs:98 job(9c111a)httpbin-0-1: provisioning has started                        
[21:03:01] DEBUG    dstack._internal.server.background.tasks.process_submitted_jobs:98 job(a01b1f)httpbin-0-0: provisioning has started                        
           INFO     dstack._internal.server.background.tasks.process_runs:330 run(af256e)httpbin: run status has changed SUBMITTED -> PROVISIONING             
[21:03:06] DEBUG    dstack._internal.server.background.tasks.process_submitted_jobs:98 job(9c111a)httpbin-0-1: provisioning has started                        
           DEBUG    dstack._internal.server.background.tasks.process_submitted_jobs:213 job(9c111a)httpbin-0-1: reuse instance failed                          
[21:03:07] INFO     dstack._internal.server.services.jobs:262 job(9c111a)httpbin-0-1: job status is FAILED, reason: FAILED_TO_START_DUE_TO_NO_CAPACITY         
           INFO     dstack._internal.server.background.tasks.process_running_jobs:413 job(a01b1f)httpbin-0-0: now is PULLING                                   
           INFO     dstack._internal.server.background.tasks.process_runs:330 run(af256e)httpbin: run status has changed PROVISIONING -> TERMINATING                                 
[21:03:15] DEBUG    dstack._internal.server.services.jobs:213 job(a01b1f)httpbin-0-0: stopping container                                                       
           INFO     dstack._internal.server.services.jobs:247 job(a01b1f)httpbin-0-0: instance 'cloud-0' has been released, new status is IDLE                 
           INFO     dstack._internal.server.services.jobs:262 job(a01b1f)httpbin-0-0: job status is TERMINATED, reason: TERMINATED_BY_SERVER                   
           INFO     dstack._internal.server.services.runs:933 run(af256e)httpbin: run status has changed TERMINATING -> FAILED, reason: JOB_FAILED
@jvstme jvstme added the bug Something isn't working label Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant