Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(example-get-started): track the whole eval dir for simplicity #251

Merged
merged 1 commit into from
Sep 9, 2023

Conversation

shcheklein
Copy link
Member

@shcheklein shcheklein commented Sep 8, 2023

Fixes #249

Simplify eval stage by tracking the whole eval directory with metrics, plots, etc.

Probably next step should be to get rid of the custom name, use log_artifact for models.

Tested it here https://github.com/shcheklein/example-get-started

@@ -27,6 +27,7 @@ jobs:
git fetch origin main:main
fi
dvc pull eval
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need this now (seems a regression, w/o it metrics are not shown in the dvc exp diff below) cc @dberenbaum . Also, I'm not sure I understand why this command is enough to get both revisions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure dvc-tracked metrics were ever pulled automatically? Up until now, we so often assumed metrics were git-tracked that I'm not sure we ever paid enough attention to this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to fix this if we plan to start recommending dvc-tracked metrics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's open an issue after this is deployed since it should be easy to reproduce from the deployed repo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I'm not sure at all. Agreed that we need to fix this. Becomes a priority I guess. Since we always though had to cache them I would not be surprised if at some point we were fetching them.

Copy link

@dberenbaum dberenbaum Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thought: haven't had a chance to test, but it might work with stage-level metrics but not top-level metrics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry WDYT about adding this to iterative/dvc#9722?

Comment on lines -60 to +67
Run [`dvc repro`](https://man.dvc.org/repro) to reproduce the
[pipeline](https://dvc.org/doc/commands-reference/pipeline):
Run [`dvc exp run`](https://man.dvc.org/exp/run) to reproduce the
[pipeline](https://dvc.org/doc/user-guide/pipelines) and create a new
[experiment](https://dvc.org/doc/user-guide/experiment-management).

```console
$ dvc repro
Data and pipelines are up to date.
$ dvc exp run
Ran experiment(s): rapid-cane
Experiment results have been applied to your workspace.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion on which we recommend here, but we still use dvc repro through the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I plan to change docs also when this is deployed - wdyt? (dvc repro becomes more of a low level command)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I go back and forth 😄 . Now I'm slightly in favor of keeping repro in this trail. Since we add one stage at a time, dvc exp run is a bit awkward to run after each stage is added. It's also unnecessarily heavy and the name doesn't fit the initial workflow we introduce very well.

Most importantly, I don't think it is that impactful either way, so not something worth blocking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, yes. I guess you are right. It can be ultimately a mix dvc repro to do iterations on the pipeline, but dvc exp run when everything is ready for that, wdyt?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's that much benefit to introduce a new command there, but it might make sense there, or as a "next step." Again, not a blocker for me, and probably better to discuss in a docs PR anyway where we can see how it actually looks.

@@ -100,7 +100,7 @@ def main():
test, _ = pickle.load(fd)

# Evaluate train and test datasets.
with Live(EVAL_PATH, cache_images=True, dvcyaml=False) as live:
with Live(EVAL_PATH, dvcyaml=False, report=None) as live:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor, but we could probably drop report=None since it at worst will be an extra file and report=None will be the default in dvclive 3.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm following that ticket . I can change it when it's merged

Copy link

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shcheklein!

@shcheklein shcheklein merged commit 5aee50b into master Sep 9, 2023
1 check passed
@shcheklein shcheklein deleted the fix-249 branch September 9, 2023 19:35
@shcheklein
Copy link
Member Author

@dberenbaum another downside of tracking metrics is that we can't show a badge now:

Screenshot 2023-09-09 at 3 48 35 PM

not a deal breaker, but a bit sad. I would say metrics file is really small, it would be nice to have it in the Git history

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

example-get-started: cache images
2 participants