Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: Add --recursive options to Running Experiments #2846

Merged
merged 11 commits into from
Oct 24, 2021

Conversation

iesahin
Copy link
Contributor

@iesahin iesahin commented Sep 22, 2021

Related #2730

--glob seems to take some time, so this one only adds the --recursive

Also per iterative/dvc#6458

@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-issue27-pwlhbh September 22, 2021 15:11 Inactive
@iesahin iesahin self-assigned this Sep 22, 2021
@jorgeorpinel jorgeorpinel changed the title guide: Add --recursive` options to Running Experiments guide: Add º--recursive` options to Running Experiments Sep 25, 2021
@jorgeorpinel jorgeorpinel changed the title guide: Add º--recursive` options to Running Experiments guide: Add --recursive options to Running Experiments Sep 25, 2021
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-iesahin-issue27-pwlhbh September 25, 2021 01:09 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-iesahin-issue27-pwlhbh September 25, 2021 03:27 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-issue27-pwlhbh September 27, 2021 11:44 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-issue27-pwlhbh September 27, 2021 11:45 Inactive
@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Sep 28, 2021

I'd just close this (and maybe #2730) per #2846 (comment).

@iesahin
Copy link
Contributor Author

iesahin commented Sep 28, 2021

I'd just close this (and maybe #2730) per #2846 (comment).

dvc exp run does not support all options of dvc repro and considering cmd ref as a substitute for the UG is something against our decision to write a UG in the first place. UG contains (or may contain) some duplicate content from the REF. That seems normal to be.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Sep 29, 2021

dvc exp run doesn't support all dvc repro options BTW. We can't assume the reader will be familiar with repro.
and considering cmd ref as a substitute for the UG is something against our decision to write a UG in the first place.

I'm not suggesting to only have repro/exp run --recursive info in the ref. We'll put it in a Data Pipelines guide at some point (maybe create some issues about that?). It's not about experiment management. UPDATE: See #2846 (comment).

@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-iesahin-issue27-pwlhbh September 29, 2021 04:16 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-iesahin-issue27-pwlhbh September 29, 2021 04:31 Inactive
Comment on lines 82 to 86
### Running pipelines recursively

DVC supports pipelines defined in more than one `dvc.yaml` file. These can
reside in subfolders inside the workspace, and you may want to run all of them
at once. Example project:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continues #2846 (review)

To summarize on this, exp run --recursive has to do with how your project structure is organized so in case people have several dvc.yaml files they may need it. I don't think we need to mention pipelines to explain this, even when they user may be using them indeed — they would know what they are and how/where they codified them already.

That said if you want to mention both the cases with multiple dvc.yaml files and the one with multiple pipelines as part of the explanation here, that's OK too. We just shouldn't focus on the concept of pipelines I believe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option or an additional thing we may want to do is to create a separate "Organizing Experiments" page (instead of https://dvc.org/doc/user-guide/experiment-management#organization-patterns) and mention exp run --recursive as a tool in there. This could be a separate issue or check box somewhere. Up to you @iesahin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is about "running experiments" and I believe it involves experiments that use pipelines in directory hierarchies. We can mention --recursive in other places as a way to organize the experiments, but here we are discussing how to run experiments if their parts are found in recursive directories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipelines are an inherent part of experiments, even if we have "experiments without pipelines" at some point, this guide will have to cover "experiments with pipelines." Specifying pipelines, organizing them or adding/deleting stages to them are irrelevant topics to this section but "running pipelines" should be covered here IMO. If we'd remove all "pipeline related stuff" from this section, about 90% of the material should be deleted. We don't have any means to run the experiments other than running the pipelines.

This individual section tells how to run the experiments if their parts are organized in directory hierarchies, not organizing experiments into directory hierarchies.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we are discussing how to run experiments if their parts are found in recursive directories

Right, it makes sense to mention exp --recursive here. I just don't think we need a full section that describes the project's organization, much less any more info about data pipelines. It can be a paragraph (maybe even a single sentence) in the existing Running all pipelines section.

Pipelines are an inherent part of experiments
at some point, this guide will have to cover "experiments with pipelines."

Disagree. You don't need a pipeline for experiments, you just need a dvc.yaml file. It's more of a technicality and in the future this implementation will be made transparent with exp init anyway. But we've discussed this a lot already...

If we'd remove all "pipeline related stuff" from this section, about 90% of the material

Most would stay, just explained from a different perspective, as mentioned in #2846 (comment).

Copy link
Contributor

@jorgeorpinel jorgeorpinel Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new comment was meant as a way to wrap it up but we are going in circles in the discussion again... Happy to do a call too.

So far I think it's clear we agree we want to mention --recursive in this guide but IMO it could be a single sentence in an existing section e.g. https://dvc.org/doc/user-guide/experiment-management/running-experiments#running-the-pipeline, even a single sentence e.g.:

"If your project uses more than one dvc.yaml file, you can run them all with the --recursive option."
(just an extremely short idea, not an exact suggestion)

I also wouldn't mind a whole section, especially if it's quite short, but currently we are introducing more data pipelining information and we have a task to extract all of that so it seems counterproductive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, was it suggested before? :)

@iesahin do you have a strong opinion on keeping the whole section?

(I personally tend to agree - even if we keep the section - I would try to generalize it into other possible ways (--recursive` is only one single option, there are way more different ways)

Copy link
Contributor Author

@iesahin iesahin Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion here, I'll reduce it to one or two sentences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! I've linked to this discussion from a task in #2911 so we can get back to the question of experiment organization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've cut the example to make it shorter. @jorgeorpinel

@shcheklein shcheklein temporarily deployed to dvc-org-iesahin-issue27-pwlhbh October 20, 2021 12:02 Inactive
Comment on lines 82 to 91
### Running pipelines recursively

When your pipeline is defined in recursive subfolders, you can selectively run
them using the `--recursive` option.

```dvc
$ dvc exp run --recursive dir/
```

It will run all the pipelines under `dir/`, and its subdirectories.
Copy link
Contributor

@jorgeorpinel jorgeorpinel Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I have more feedback. To some extent --recursive is similar to --all-pipelines (the same but restricted to a certain directory I think), so let's combine the sections? They're very short on their own anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do believe that having more titles makes it easier to follow the text. This is technical writing, not a fiction where you have to read sequentially to understand. People will be coming to this page from all different directions.

"To some extent" all of the options of dvc exp run are similar, they select some stage to run. Could we merge all of the sections, then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but by making title too specific you make for users very hard to relate them to the problem they have. E.g. in this case people will be looking for "how to run a specific pipeline" vs "running pipelines recursively" (most likely, not precise example, important to illustrate the point)

E.g. a better general section for you can be:

Run pipelines selectively (you can see it from the way you write about it)

Copy link
Contributor

@jorgeorpinel jorgeorpinel Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we merge all of the sections, then?

Not all but https://dvc.org/doc/user-guide/experiment-management/running-experiments does have lots of very small sections (and subsections) in general, now that I see it (as context). We could probably consolidate some easily if we wanted (not suggesting that for this PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. PTAL.

@jorgeorpinel

This comment has been minimized.

@jorgeorpinel jorgeorpinel temporarily deployed to dvc-org-iesahin-issue27-pwlhbh October 24, 2021 20:59 Inactive
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let you merge if you agree @iesahin

@shcheklein shcheklein merged commit 7612a8a into master Oct 24, 2021
@shcheklein shcheklein deleted the iesahin/issue2730 branch October 24, 2021 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants