Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add tensorboard delete command to CLI #8227

Merged
merged 27 commits into from
Nov 3, 2023
Merged

feat: Add tensorboard delete command to CLI #8227

merged 27 commits into from
Nov 3, 2023

Conversation

AmanuelAaron
Copy link
Contributor

@AmanuelAaron AmanuelAaron commented Oct 23, 2023

Description

Creating a CLI command for deleting tensorboard files from a given experiment ID. We want some of the existing logic for checkpointGC to perform the file deleteion.

Usage of command would be det tensorboards delete <experiment-id>

Test Plan

  1. Create an experiment
    Example: det e create examples/tutorials/mnist_pytorch/const.yaml examples/tutorials/mnist_pytorch/

  2. Verify that Tensorboard files exist:

$ cd /tmp/determined-cp/
$ find . -name 'tensorboard*' -type d
$ >> <folder-name>/tensorboard
$ cd <folder-name>/tensorboard/experiment/<exp-id>/trial/<exp-id>
$ ls
  1. Delete tensorboard files:
    det e delete-tb-files <exp-id>

  2. Check that files are deleted:

$ cd <folder-name>/tensorboard/experiment/
$ ls (folder for <exp-id> should be gone)

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

Ticket

DET-9865

@netlify
Copy link

netlify bot commented Oct 23, 2023

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit 5578b82
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/6545567816b4c30008461cc0

master/pkg/tasks/task_gc.go Outdated Show resolved Hide resolved
master/internal/tensorboard_gc.go Outdated Show resolved Hide resolved
Copy link
Contributor

@stoksc stoksc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add an end to end test for this that verifies the TB files are actually deleted. i think just an e2e cpu is fine. if we want per-storage manager tests we can integration test it, i just would like at least one e2e feature test for this.

@cla-bot cla-bot bot removed the cla-signed label Oct 30, 2023
@determined-ci determined-ci requested a review from a team October 30, 2023 21:46
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci determined-ci added the documentation Improvements or additions to documentation label Oct 30, 2023
@cla-bot cla-bot bot added the cla-signed label Oct 30, 2023
@determined-ci determined-ci removed the documentation Improvements or additions to documentation label Oct 30, 2023
@AmanuelAaron AmanuelAaron marked this pull request as ready for review October 31, 2023 17:30
@AmanuelAaron AmanuelAaron requested review from a team as code owners October 31, 2023 17:30
@cla-bot cla-bot bot removed the cla-signed label Oct 31, 2023
@determined-ci determined-ci added the documentation Improvements or additions to documentation label Nov 1, 2023
e2e_tests/tests/command/test_tensorboard.py Outdated Show resolved Hide resolved
e2e_tests/tests/command/test_tensorboard.py Outdated Show resolved Hide resolved
e2e_tests/tests/command/test_tensorboard.py Outdated Show resolved Hide resolved
Comment on lines 219 to 221
"delete",
delete_tensorboard,
"delete TensorBoard files associate with the proived experiment ID",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azhou-determined or someone from ml sys should review this change.

personally i think it could be a bit confusing for a user. maybe det e delete-tensorboard-files :ID is more clear about what it does but a pain to type.

"tensorboard" in this context so far means "a running tensorboard application" not "tensorboard files" so "det tensorboard delete ..." makes me think it would do something similar to "det tensorboard kill ...".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed that det tensorboard delete reads like you want to delete the tensorboard instance.

maybe something like delete-files, or delete-experiment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like det tensorboard delete-files --experiment-id $ID or det experiment delete-tensorboard-files $ID. preferring the latter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From offline discussion with @wes-turner & @stoksc. We went with det e delete-tb-files

@determined-ai determined-ai deleted a comment from cla-bot bot Nov 1, 2023
@determined-ai determined-ai deleted a comment from cla-bot bot Nov 1, 2023
@determined-ai determined-ai deleted a comment from cla-bot bot Nov 3, 2023
@AmanuelAaron AmanuelAaron requested review from wes-turner and removed request for NicholasBlaskey November 3, 2023 20:03
Copy link
Contributor

@wes-turner wes-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python side 👍

Copy link
Contributor

@stoksc stoksc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

master/internal/api_experiment.go Outdated Show resolved Hide resolved
@cla-bot cla-bot bot removed the cla-signed label Nov 3, 2023
@AmanuelAaron AmanuelAaron force-pushed the tbs_delete branch 2 times, most recently from a0aeea3 to f818046 Compare November 3, 2023 20:17
@cla-bot cla-bot bot added the cla-signed label Nov 3, 2023
@determined-ai determined-ai deleted a comment from cla-bot bot Nov 3, 2023
@determined-ai determined-ai deleted a comment from cla-bot bot Nov 3, 2023
@determined-ai determined-ai deleted a comment from cla-bot bot Nov 3, 2023
@AmanuelAaron AmanuelAaron merged commit 1966373 into main Nov 3, 2023
74 of 84 checks passed
@AmanuelAaron AmanuelAaron deleted the tbs_delete branch November 3, 2023 20:59
return nil, err
}

exp, err := db.ExperimentByID(context.TODO(), int(req.ExperimentId))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this sorry, but we should be passing ctx here and on 2991.

Copy link
Contributor Author

@AmanuelAaron AmanuelAaron Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix: #8332

@dannysauer dannysauer added this to the 0.26.4 milestone Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed documentation Improvements or additions to documentation User-facing API Change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants