Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: configure checkpoint and validation periods without steps [DET-3525] #854

Closed
wants to merge 47 commits into from
Closed

feat: configure checkpoint and validation periods without steps [DET-3525] #854

wants to merge 47 commits into from

Conversation

stoksc
Copy link
Contributor

@stoksc stoksc commented Jul 9, 2020

Description

opening for some ci as I do final tweaks.

Test Plan

Commentary (optional)

@stoksc
Copy link
Contributor Author

stoksc commented Jul 15, 2020

Closed in favor of #885.

@stoksc stoksc closed this Jul 15, 2020
eecsliu pushed a commit to eecsliu/determined that referenced this pull request Jun 23, 2023
…-ai#854)

If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
stoksc pushed a commit that referenced this pull request Jun 26, 2023
If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
eecsliu pushed a commit that referenced this pull request Jun 28, 2023
If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
eecsliu pushed a commit that referenced this pull request Jun 28, 2023
If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
stoksc pushed a commit that referenced this pull request Jul 20, 2023
If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
eecsliu pushed a commit that referenced this pull request Jul 24, 2023
If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
stoksc pushed a commit that referenced this pull request Oct 17, 2023
If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
azhou-determined pushed a commit that referenced this pull request Dec 7, 2023
If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
wes-turner pushed a commit that referenced this pull request Feb 2, 2024
If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
rb-determined-ai pushed a commit that referenced this pull request Feb 29, 2024
If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
amandavialva01 pushed a commit that referenced this pull request Mar 18, 2024
If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
eecsliu pushed a commit that referenced this pull request Apr 18, 2024
If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
eecsliu pushed a commit to determined-ai/determined-release-testing that referenced this pull request Apr 22, 2024
…-ai#854)

If the synchronous launch takes too long, we may run into a
job monitoring poll which will detect the dispatchID as 404/NOT_FOUND
and stop monitoring. This would cause status updates to stop and
the job to show only QUEUED even though it is executing.
Add a launchInProgress flag to the job monitor to cause 404/NOT_FOUND
to be ignored until the launch has succeeded to avoid this case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant