Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use TF Tensorboard writer by default [DET-3353] #857

Merged
merged 2 commits into from
Jul 9, 2020

Conversation

aaron276h
Copy link
Contributor

@aaron276h aaron276h commented Jul 9, 2020

Description

It appears that using the PyTorch TensorBoard writer lead to TF events coming out in a "strange" state.

Test Plan

Ran several experiments and observed that file naming changed from always having a file ending 1.0 to not having it, this file seemed to have been the cause of some of the issues.

Going to test this out with a container that doesn't have TF.

Copy link
Contributor

@sidneyw sidneyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -174,15 +174,14 @@ def _scan_checkpoint_directory(checkpoint_dir: str) -> List[Checkpoint]:
return list(checkpoints.values())


def move_tf_events(root_dir: str) -> None:
def move_tf_events(event_dir: pathlib.Path) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: nice clean up here

@@ -147,15 +147,15 @@ def prepare_tensorboard(
env, env.experiment_config["checkpoint_storage"], container_path
)
try:
from determined.tensorboard.metric_writers import pytorch
from determined.tensorboard.metric_writers import tensorflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: as mentioned just make sure that this works well in containers without tensorflow installed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked and it works 🥳

@aaron276h aaron276h merged commit 1e707df into determined-ai:master Jul 9, 2020
eecsliu pushed a commit to eecsliu/determined that referenced this pull request Jun 23, 2023
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
stoksc pushed a commit that referenced this pull request Jun 26, 2023
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
eecsliu pushed a commit that referenced this pull request Jun 28, 2023
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
eecsliu pushed a commit that referenced this pull request Jun 28, 2023
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
stoksc pushed a commit that referenced this pull request Jul 20, 2023
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
eecsliu pushed a commit that referenced this pull request Jul 24, 2023
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
stoksc pushed a commit that referenced this pull request Oct 17, 2023
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
azhou-determined pushed a commit that referenced this pull request Dec 7, 2023
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
wes-turner pushed a commit that referenced this pull request Feb 2, 2024
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
@dannysauer dannysauer added this to the 0.12.12 milestone Feb 6, 2024
rb-determined-ai pushed a commit that referenced this pull request Feb 29, 2024
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
amandavialva01 pushed a commit that referenced this pull request Mar 18, 2024
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
eecsliu pushed a commit that referenced this pull request Apr 18, 2024
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
eecsliu pushed a commit to determined-ai/determined-release-testing that referenced this pull request Apr 22, 2024
If a dispatch suddenly disappeared from the luancher (404) without any action
monitoring was dropped without any notification of job completion.
On 404, notify that the job was lost and terminate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants