Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: DVC Experiments Overview #2909

Merged
merged 50 commits into from
Dec 13, 2021
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
5e43591
guide: add DVC Experiments page and links +
jorgeorpinel Oct 9, 2021
6b7300a
guide: remove checkpoint related changes
jorgeorpinel Oct 10, 2021
6027e15
guide: remove `dvc experiments` long cmd autolinks
jorgeorpinel Oct 10, 2021
7704a4d
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Oct 11, 2021
8f04899
guide: move run-cache section back to Exp Mgmt index bottom
jorgeorpinel Oct 11, 2021
3bfd2a9
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Nov 1, 2021
0c2bcf5
guide: Exp Mgmt/ DVC Exps -> Exps Overview
jorgeorpinel Nov 1, 2021
27afdc1
guide: clear separation between Exp Mgmt index and Overview page
jorgeorpinel Nov 2, 2021
30db819
guide: single guide for Persisting Exps content and
jorgeorpinel Nov 2, 2021
aa3c5d0
guide: begin extracting Exp details from Running to Overview
jorgeorpinel Nov 2, 2021
7710433
guide: make ToC entry for Run Cache section
jorgeorpinel Nov 2, 2021
a133f70
Update content/docs/user-guide/experiment-management/index.md
jorgeorpinel Nov 4, 2021
32a269f
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Nov 4, 2021
af94248
Merge branch 'master' into exp/dvc-exps-page +
jorgeorpinel Nov 17, 2021
dacaf85
[NESTED] guide: Exp implementation details, naming into Overview (#3006)
jorgeorpinel Nov 17, 2021
cab14da
Merge branch 'master' into exp/dvc-exps-page +
jorgeorpinel Nov 29, 2021
9a1e142
Merge branch 'exp/dvc-exps-page' of github.com:iterative/dvc.org into…
jorgeorpinel Nov 30, 2021
b40f340
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Nov 30, 2021
73175a9
guide: emphasize dvc exps are not part of Git tree in overview
jorgeorpinel Nov 30, 2021
112ad87
guide: ID->name in dvc exps overview
jorgeorpinel Nov 30, 2021
9c2a55c
guide: ID->name in other exp guides
jorgeorpinel Nov 30, 2021
9b2902a
guide: Visualize->Review in exp/overview/basic-workflow
jorgeorpinel Nov 30, 2021
7b9384f
guide: don't say "cleans the slate" in exp/overview/basic-workflow
jorgeorpinel Nov 30, 2021
c9493f4
giude: soften params description in exps index
jorgeorpinel Nov 30, 2021
42454f0
guide: generalize dvc exps basic workflow
jorgeorpinel Nov 30, 2021
bd95136
guide: Properties section in DVC Exps overview page
jorgeorpinel Nov 30, 2021
6162f5a
guide: exp init section in Exp Overview page
jorgeorpinel Nov 30, 2021
63a9864
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Dec 1, 2021
5043e64
guide: clarify dvc exp implementation
jorgeorpinel Dec 1, 2021
27f01e6
guide: expand on Exp Overview motivation
jorgeorpinel Dec 1, 2021
a799743
guide: direct language in Exp Overview/ workflow intro
jorgeorpinel Dec 1, 2021
59505f6
guide: mention metrics in exp init intro (Exp Overview)
jorgeorpinel Dec 1, 2021
3d0bede
guide: intro exp init before giving specific examples of what it does
jorgeorpinel Dec 1, 2021
db2d610
guide: hint forach stages for hybrid exp org pattern
jorgeorpinel Dec 1, 2021
f6eef79
guide: exp mgmt index copy edits
jorgeorpinel Dec 1, 2021
c68fc78
guide: mention label-based exp organization
jorgeorpinel Dec 1, 2021
3384af0
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Dec 7, 2021
9fd3b3a
guide: hide exp naming section in overview page and
jorgeorpinel Dec 7, 2021
f241901
guide: mention `exp init -i` in Overview
jorgeorpinel Dec 7, 2021
e122b0a
guide: typo fix
jorgeorpinel Dec 7, 2021
659dd82
Merge branch 'master' into exp/dvc-exps-page +
jorgeorpinel Dec 7, 2021
73d510d
ref: exp apply copy edits
jorgeorpinel Dec 7, 2021
9d43ca6
ref: mention init before exp init
jorgeorpinel Dec 7, 2021
24c967d
guide: correct info aboug exp init in Exp Overview
jorgeorpinel Dec 7, 2021
439050e
ref: link from exp init to corresponding guide
jorgeorpinel Dec 7, 2021
3af2f9a
guide: make exp intro more concrete
jorgeorpinel Dec 8, 2021
12f8797
guide: rewrite exp init section of Exps Overview page
jorgeorpinel Dec 8, 2021
ad652a6
Merge branch 'master' into exp/dvc-exps-page
jorgeorpinel Dec 10, 2021
8aed622
ref: roll back unrelated ref changes (moved to ref/exp-misc)
jorgeorpinel Dec 10, 2021
c088a06
guide: roll back unrelated changes (moved to #3080)
jorgeorpinel Dec 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 15 additions & 11 deletions content/docs/command-reference/exp/apply.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,25 @@ positional arguments:
## Description

Restores an `experiment` into the workspace as long as no more Git commits have
been made after the target experiment (`HEAD` hasn't moved). The `experiment`
can be referenced by name or hash (see `dvc exp run` for details). This changes
any files (code, data, <abbr>parameters</abbr>, <abbr>metrics</abbr>, etc.)
needed to reflect the experiment conditions and results in the workspace.
been made after the target experiment (`HEAD` hasn't moved). The experiment can
be referenced by name or hash (see `dvc exp run` for details).

⚠️ Conflicting changes in the workspace are overwritten unless `--no-force` is
used.
Specifically, `dvc exp apply` changes any files (code, data,
<abbr>parameters</abbr>, <abbr>metrics</abbr>, etc.) needed to reflect the
experiment conditions and results in the workspace. Current changes to the
workspace are preserved except if they conflict with the experiment in question.

⚠️ Conflicting changes in the workspace are overwritten unless unless
`--no-force` is used.

This is typically used after choosing a target `experiment` with `dvc exp show`
or `dvc exp diff`, and before committing it to Git (making it
[persistent](/doc/user-guide/experiment-management#persistent-experiments)).
or `dvc exp diff`, and before committing it to Git (making it [persistent].

> Note that if a history of [checkpoints] is found in the `experiment`, it will
> **not** be preserved when applying and committing it.

Note that the history of
[checkpoints](/doc/command-reference/exp/run#checkpoints) found in the
`experiment` is **not** preserved when applying and committing it.
[persistent]: /doc/user-guide/experiment-management/persisting-experiments
[checkpoints]: /doc/user-guide/experiment-management/checkpoints

## Options

Expand Down
20 changes: 11 additions & 9 deletions content/docs/command-reference/exp/branch.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,26 +15,28 @@ positional arguments:

## Description

Makes a named Git
[`branch`](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging)
containing the target `experiment` (making it
[persistent](/doc/user-guide/experiment-management#persistent-experiments)). For
[checkpoint experiments](/doc/command-reference/exp/run#checkpoints), the new
branch will contain multiple commits (the checkpoints).
Makes a named Git [`branch`] containing the target `experiment` (making it
[persistent]. For [checkpoint experiments], the new branch will contain multiple
commits (the checkpoints).

The new `branch` will be based on the experiment's parent commit (`HEAD` at the
time that the experiment was run). Note that DVC **does not** switch into the
new `branch` automatically.

`dvc exp branch` is useful to make an experiment persistent without modifying
the workspace, so they can be continued,
[stored, and shared](https://dvc.org/doc/use-cases/sharing-data-and-model-files)
in a normal Git + DVC workflow.
the workspace, so they can be continued, [stored and shared] in a normal Git +
DVC workflow.

To switch into the new branch, use `git checkout branch` and `dvc checkout`. Or
use `git merge branch` and `dvc repro` to combine it with your current project
version.

[`branch`]:
https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging
[persistent]: /doc/user-guide/experiment-management/persisting-experiments
[checkpoint experiments]: /doc/command-reference/exp/run#checkpoints
[stored and shared]: /doc/use-cases/sharing-data-and-model-files

## Options

- `-h`, `--help` - shows the help message and exit.
Expand Down
8 changes: 8 additions & 0 deletions content/docs/command-reference/exp/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
Codify project using [DVC metafiles](/doc/user-guide/project-structure) to run
[experiments](/doc/user-guide/experiment-management).

> Requires a <abbr>DVC repository</abbr>, created with `git init` and
> `dvc init`.

## Synopsis

```usage
Expand Down Expand Up @@ -32,6 +35,11 @@ training of machine learning models.
This command is intended to be a quick way to start running experiments. To
create more complex stages and pipeliens, use `dvc stage add`.

> 📖 More context in [Experiments Overview].

[experiments overview]:
/doc/user-guide/experiment-management/experiments-overview

### The `command` argument

The `command` argument is optional, if you are using `--interactive` mode. The
Expand Down
45 changes: 20 additions & 25 deletions content/docs/command-reference/exp/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,45 +22,40 @@ Provides a way to execute and track <abbr>experiments</abbr> in your
<abbr>project</abbr> without polluting it with unnecessary commits, branches,
directories, etc.

> `dvc exp run` is equivalent to `dvc repro` for experiments. It has the same
> behavior when it comes to `targets` and stage execution (restores the
> dependency graph, etc.). See the command [options](#options) for more on the
> differences.
> `dvc exp run` is equivalent to `dvc repro` for <abbr>experiments</abbr>. It
> has the same behavior when it comes to `targets` and stage execution (restores
> the dependency graph, etc.). See the command [options](#options) for more on
> the differences.

Before running an experiment, you'll probably want to make modifications such as
data and code updates, or <abbr>hyperparameter</abbr> tuning. For the latter,
you can use the `--set-param` (`-S`) option of this command to change
`dvc param` values on-the fly.

📖 See [DVC Experiments](/doc/user-guide/experiment-management) for more
information.

Each experiment creates and tracks a project variation based on your
<abbr>workspace</abbr> changes. Experiments will have a unique, auto-generated
name like `exp-bfe64` by default, which can be customized using the `--name`
(`-n`) option.

<details>

### ⚙️ How does DVC track experiments?

Experiments are custom
[Git references](https://git-scm.com/book/en/v2/Git-Internals-Git-References)
(found in `.git/refs/exps`) with a single commit based on `HEAD` (not checked
out by DVC). Note that these commits are not pushed to the Git remote by default
(see `dvc exp push`).
Each experiment creates and tracks a project variation based on the changes in
your <abbr>workspace</abbr>. The results of the last `dvc exp run` will be
reflected in the workspace. Experiments will have an auto-generated ID like
`exp-bfe64` by default. A custom name can be given instead, using the `--name`
(`-n`) option

</details>

The results of the last `dvc exp run` can be seen in the workspace. To display
and compare multiple experiments, use `dvc exp show` or `dvc exp diff`
(`plots diff` also accepts experiment names as `revisions`). Use `dvc exp apply`
to restore the results of any other experiment instead.

Successful experiments can be made
[persistent](/doc/user-guide/experiment-management#persistent-experiments) by
committing them to the Git repo. Unnecessary ones can be removed with
To display and compare multiple experiments, use `dvc exp show` or
`dvc exp diff` (`plots diff` also accepts experiment names as `revisions`). Use
`dvc exp apply` to restore the results of any experiment, for example to [commit
them][persisting] to Git. Unnecessary experiments can be removed with
`dvc exp remove`or `dvc exp gc` (or abandoned).

> Note that experiment data will remain in the <abbr>cache</abbr> until you use
> regular `dvc gc` to clean it up.
> Note that experiment data will remain in the local <abbr>cache</abbr> until
> you use regular `dvc gc` to clean it up.

[persisting]: /doc/user-guide/experiment-management/persisting-experiments

## Checkpoints

Expand Down
1 change: 1 addition & 0 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@
"slug": "experiment-management",
"source": "experiment-management/index.md",
"children": [
"experiments-overview",
"running-experiments",
"comparing-experiments",
"sharing-experiments",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

Although DVC uses minimal resources to keep track of the experiments, they may
clutter tables and the workspace. DVC allows to remove specific experiments from
the workspace or delete all not-yet-persisted experiments at once.
the workspace or delete all not-yet-[persisted] experiments at once.

[persisted]: /doc/user-guide/experiment-management/persisting-experiments

## Removing specific experiments

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -384,8 +384,8 @@ params.yaml train.epochs 10 10 0
## Compare an experiment with the workspace

When you want to compare two experiments, either the baseline experiment in a
commit, branch, tag or an attached experiment with ID, you can supply their
names to `dvc exp diff`.
commit, branch, or tag; or an attached experiment by name, you can supply any of
these references to `dvc exp diff`.

```
$ dvc exp diff cnn-128 cnn-64
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# DVC Experiments Overview
Copy link
Member

@shcheklein shcheklein Nov 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be a good page (still not clear if need a separate one for this though, considering that we have index)

should we do a diagram here with the basic workflow?

should we include cleaning here?

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good ideas. Not sure about Cleaning Exps in here (you need to know how to make them first?) but a diagram for the workflow would be nice.

I also recently thought about possibly putting experiment naming information (IDs vs names) here. Added check box to the PR description. Thought that may be more appropriate for the Running or Comparing Exps guides, not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a diagram for the workflow would be nice

Since finalizing that properly may involve including alternative paths (a sort of flow chart) and design work, I vote to make it a follow-up issue (tied to the Exp Versioning release cc @dberenbaum) so we can merge this, if the content is approved in general.


DVC Experiments are captured automatically by DVC when [run]. Each experiment
creates and tracks a variation of your data science project based on the changes
in your <abbr>workspace</abbr>.

Experiments preserve a connection to the latest commit in the current branch
(Git `HEAD`) as their parent or _baseline_, but do not form part of the regular
Git tree or workflow (unless you make them [persistent]). This prevents
polluting Git namespaces and bloating the repo unnecessarily.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

[run]: /doc/user-guide/experiment-management/running-experiments

<details>

### ⚙️ How does DVC track experiments?

Experiments are custom [Git references](/blog/experiment-refs) (found in
`.git/refs/exps`) with one or more commits based on `HEAD`. These commits are
hidden and not checked out by DVC. Note that these are not pushed to Git remotes
by default either (see `dvc exp push`).

Note that DVC Experiments require a unique name to identify them. DVC will
usually auto-generate one by default, such as `exp-bfe64` (based on the
experiment's hash). A custom name can be set instead, using the `--name`/`-n`
option of `dvc exp run`. These names can be used to reference experiments in
other `dvc exp` subcommands.

</details>

## Basic workflow

`dvc exp` commands let you automatically track a variation of a project version
(the baseline). You can create independent groups of experiments this way, as
well as review, compare, and restore them later. The basic workflow goes like
this:

- Modify hyperparameters or other dependencies (input data, source code,
commands to execute, etc.). Leave these changes un-committed in Git.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might help to discuss why we have this workflow and want to leave changes un-committed. In other experiment tracking tools, the workflow looks like:

  1. Run the experiment.
  2. Log the commit hash of the current HEAD, even though it doesn't include your dirty changes.
  3. If everything is successful, save changes in a new commit.

This creates a confusing state where the experiment should really be associated with the second commit instead of the first.

It might be too much detail or inappropriate for the page, but maybe it can be summarized, or it might spark other ideas. This is probably a good point for the blog post but not sure if there's a place for it...

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the discussions in https://www.notion.so/iterative/Experiment-Workflows-3873c4f3cc2e49c6a7871a831bc8302b

Yeah this needs more work. I'm hoping for now it's mergeable but I'll improve is as much as possible ⌛

This comment was marked as off-topic.

This comment was marked as off-topic.

- [Run experiments][run] with `dvc exp run` (instead of `repro`). The results
are reflected in your <abbr>workspace</abbr>, and tracked automatically.
- Review and [compare] experiments with `dvc exp show` or `dvc exp diff`, using
[metrics](/doc/command-reference/metrics) to identify the best one(s). Repeat
🔄
- Make certain experiments [persistent] by committing their results to Git. This
lets you repeat the process from that point.

[pipeline]: /doc/user-guide/project-structure/pipelines-files
[compare]: /doc/user-guide/experiment-management/comparing-experiments
[persistent]: /doc/user-guide/experiment-management/persisting-experiments

## Initialize DVC Experiments on any project

DVC Experiments build on basic semantics of <abbr>DVC projects</abbr>. This
means that minimal formalities are required.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

`dvc exp init` lets you quickly onboard an existing data science project to use
DVC Experiments, without having to worry about bootstrapping DVC manually. You
can either supply a `command` to execute your experiments or use the
`--interactive` flag (`-i`) to be prompted for that and other optional
customizations.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

This creates a simple `dvc.yaml` file for you. It uses sane default locations
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
for your project's <abbr>dependencies</abbr> (data, parameters, source code) and
<abbr>outputs</abbr> (ML models or other artifacts, <abbr>metrics</abbr>, etc.)
-- which you can customize via `-i` or other options of `dvc exp init`.

You can review the results (and commit them to Git) to begin using DVC
Experiments. Now you can move on to [running your experiments][run] (next).

[codify a pipeline]: /doc/user-guide/project-structure/pipelines-files
Loading