Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: exp init improvements #3071

Merged
merged 18 commits into from
Dec 23, 2021
Merged

ref: exp init improvements #3071

merged 18 commits into from
Dec 23, 2021

Conversation

jorgeorpinel
Copy link
Contributor

@jorgeorpinel jorgeorpinel commented Dec 7, 2021

Follow-up to #3015


@jorgeorpinel jorgeorpinel self-assigned this Dec 7, 2021
@jorgeorpinel jorgeorpinel mentioned this pull request Dec 8, 2021
5 tasks
@jorgeorpinel

This comment has been minimized.

└── src/
```

> Note that `params.yaml` is the only required file (see `dvc params`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note ... for the sake of my education: why is it required?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To parse the parameters and put them into dvc.yaml.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but even if you don't have params an empty file is still required.

Related to iterative/dvc#6446 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also related to iterative/dvc#6605. The params section of dvc.yaml currently expects keys from a file (see below), not a filename itself, so all the parameters need to be defined before creating the stage. If we want to stop requiring this file at the time of dvc exp init, we need to change params to accept filenames.

stages:
  train:
    cmd: python train.py
    deps:
    - data
    - src
    params:
    - params.py:
      - batch_size
      - data_path
      - epochs
      - latent_dim
      - num_samples

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about generating a dvc.yaml without params if there's no parmas.yaml file? This is a core discussion though! Moved to iterative/dvc#6446 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood you, and now I think it's actually more of a docs issue 🤔 . It is possible to use dvc exp init without params. You can either use --explicit and not provide --params, or you can use dvc exp init -i and type n at the params prompt.

The note here is to indicate that if you do use either the default or some other params path, then that path must exist and be parseable to extract the parameters for the stage. So the current note probably needs to be reworded.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE: Looks like the discussion about empty params.yaml files is in a few core repo tickets now (e.g. iterative/dvc#7138) so I won't keep discussing here (still a bit confused though).

It is possible to use dvc exp init without params...
So the current note probably needs to be reworded.

Clarified in 2273431.

Comment on lines 51 to -45
### The `command` argument

The `command` argument is optional, if you are using `--interactive` mode. The
Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW a question I have is whether to keep this section or move command under Options like we did for targets in https://dvc.org/doc/command-reference/repro#options. I personally like the section more but I remember we discussed it (cc @shcheklein) this and using Options was picked, so for consistency I'd move this under Options as well.

From #3015 (review)

@jorgeorpinel jorgeorpinel marked this pull request as ready for review December 14, 2021 02:43
Copy link
Collaborator

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A couple small notes to help clarify for users what the expectations are for the different paths.

content/docs/command-reference/exp/init.md Outdated Show resolved Hide resolved
@@ -19,43 +18,57 @@ usage: dvc exp init [-h] [-q | -v] [--run] [--interactive] [-f]

## Description
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere in the description, it might be useful to explain that dvc exp init by default expects that input data, parameters, and source code paths exist before running an experiment, and that the command is expected to generate models, metrics, and plots.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's more or a problem for exp run though, @dberenbaum (already linked form the --run option). But should we state that exp run (and repro for that matter) expect that the stage definition and code are good? Hopefully its evident.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's more or a problem for exp run though

Well that's the point of exp init, right? Better to have users understand what's needed up front than to have them run exp init only to fail on exp run.

Doesn't need to be part of this PR. It could also be handled by some of the suggested changes to the core command rather than the docs.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have users understand what's needed up front than to have them run exp init only to fail on exp run.

I'm not sure. If users need to read that on the cmd ref., is exp init serving as a self-explanatory command? I think those notes should be in the command's output fist (and added to this doc as a result of that product update).

@shcheklein shcheklein temporarily deployed to dvc-org-ref-exp-init-nfacz5bbh December 16, 2021 02:43 Inactive
@shcheklein
Copy link
Member

looks good to me (I didn't check the details though, since Dave've done this already)

Do we want to add an example? (may be even animation?)

@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-exp-init-nfacz5bbh December 21, 2021 10:08 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-exp-init-nfacz5bbh December 21, 2021 10:46 Inactive
@jorgeorpinel
Copy link
Contributor Author

Do we want to add an example?

Yep @shcheklein it's a check box I didn't get to yet ⌛ (could be a separate PR)

may be even animation?

Good idea in general to consider gifs but in this case I think a static console sample with the entire -i output would be better so you can read them calmly. Animations would be good for samples that focus on progress, mutation, etc. Let's keep it in mind.

@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-exp-init-nfacz5bbh December 21, 2021 11:34 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-exp-init-nfacz5bbh December 21, 2021 11:34 Inactive
@jorgeorpinel
Copy link
Contributor Author

UPDATE: I added a simple example just to give a sense of the interactive-mode UI. Not sure which features we want to highlight with examples otherwise.

Copy link
Collaborator

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Adding an example is great, but why not focus on a model training example?

@@ -19,43 +18,57 @@ usage: dvc exp init [-h] [-q | -v] [--run] [--interactive] [-f]

## Description
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's more or a problem for exp run though

Well that's the point of exp init, right? Better to have users understand what's needed up front than to have them run exp init only to fail on exp run.

Doesn't need to be part of this PR. It could also be handled by some of the suggested changes to the core command rather than the docs.


Let's prepare a raw data ingestion script to start running experiments on it.

```dvc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a model training example? Data ingestion is a nice application to show how this can be useful even for non-training stages, but model training is the primary use case. It's unlikely for someone to run experiments on a data ingestion script.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data ingestion example produced a very small dvc.yaml file and then I had that note about using stage add to produce the same. Idk, I thought it was interesting. Anyway, I agree the training scenario is more applicable so I replaced all that now.

by our code in question, so the default is used. Models, metrics, and plots are
omitted as there are none for an ingestion process.

The resulting `dvc.yaml` file is simply:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning pipelines briefly to explain what's in dvc.yaml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actively avoiding mixing in that concept here but it is mentioned strategically once in the Description. I did elaborate on this sentence a bit though.

@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-exp-init-nfacz5bbh December 22, 2021 21:32 Inactive
@jorgeorpinel
Copy link
Contributor Author

why not focus on a model training example?

OK @dberenbaum . Done in 56f4a46.

Copy link
Collaborator

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-ref-exp-init-nfacz5bbh December 23, 2021 02:22 Inactive
@jorgeorpinel
Copy link
Contributor Author

Thanks!

@jorgeorpinel
Copy link
Contributor Author

Looks like the failing link check is a false positive. I'll see if this is already reported to the link checker repo. Merging 🔀

Comment on lines 1 to +3
# exp init

Codify project using [DVC metafiles](/doc/user-guide/project-structure) to run
[experiments](/doc/user-guide/experiment-management).
Quickly setup any project to use [DVC Experiments].
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant this (per last check box in this PR desc.) @dberenbaum @skshetry . Currently dvc exp init -h prints Initialize DVC in the current directory. Expects directory to be a Git repository... Do we want to update that text?

Copy link
Collaborator

@dberenbaum dberenbaum Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the dvc init -h text 😄 . dvc exp init -h says Initialize experiments. However, I think we agreed this new language is better, so I'm happy to put in a PR to update in the core repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the dvc init -h text 😄

yep 🤦 sry. OK, thanks for the PR

iesahin pushed a commit that referenced this pull request Apr 11, 2022
* ref: first copy edits to exp init

* ref: clarify exp init explanations

* ref: clarify `exp init` option descriptions

* ref: re-describe `exp init` + reorder in nav and `exp` help
per #3071 (review)

* ref: clarify params.yaml is needed only with defaults in params.yaml
per #3071 (comment)

* ref: clarify what --interactive prompts user for
per #3071 (review)

* ref: link from exp init to config section and
mention --explicit avoids a params.yaml file too.

* ref: simplify exp init --explicit explanation

* ref: explain why params.yaml are required (by default) in exp init
per #3071 (comment)

* ref: copy edits to exp init

* ref: add simple example to exp init
rel. #3071 (comment)

* ref: use model training example in exp init
per #3071 (review)

* ref: shorten sample block
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants