Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

[Close as Dup] Job Retry Bug in PAI #865

Closed
leelaylay opened this issue Mar 18, 2019 · 2 comments
Closed

[Close as Dup] Job Retry Bug in PAI #865

leelaylay opened this issue Mar 18, 2019 · 2 comments
Labels

Comments

@leelaylay
Copy link
Contributor

leelaylay commented Mar 18, 2019

Short summary about the issue/question:

I submit the job into PAI platform while there is not enough compuation resource. So the job will fail possibly and wait for some time to submit again and again. The job can succeed in PAI finally, but NNI can not get intermediate or final metric.

Brief what process you are following: Normal process with Tuner

How to reproduce it:
When the job have to retry because of some reasons(low priority or not enough compuation resource).

NNI Environment:

  • nni version: 0.5.2
  • nni mode(local|pai|remote): pai
  • OS: ubuntu 18.04 (wsl)
  • python version: python 3.6.7
  • is conda or virtualenv used?: conda
  • is running in docker?: Nope
@leelaylay
Copy link
Contributor Author

Similar to #863 but it happens usually for all tuners and assessors.

@scarlett2018 scarlett2018 changed the title Job Retry Bug in PAI [Close as Dup] Job Retry Bug in PAI Apr 10, 2019
@scarlett2018
Copy link
Member

Close as dup of correctly deal with job retries on openpai #919

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants