Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace dvc run in cmd-ref #3223

Merged
merged 2 commits into from
Jan 27, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions content/docs/command-reference/dag.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ final model, as well as accuracy [metrics](/doc/command-reference/metrics).

In DVC, pipeline stages and commands, their data I/O, interdependencies, and
results (intermediate or final) are specified in `dvc.yaml`, which can be
written manually or built using the helper command `dvc run`. This allows DVC to
restore one or more pipelines later (see `dvc repro`).
written manually or built using the helper command `dvc stage add`. This allows
DVC to restore one or more pipelines later (see `dvc repro`).

> DVC builds a dependency graph
> ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this.
Expand Down
26 changes: 15 additions & 11 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,17 +110,19 @@ Instead of:
$ dvc import-url https://data.dvc.org/get-started/data.xml data.xml
```

It is possible to use `dvc run`, for example (HTTP URL):
It is possible to use `dvc stage add`, for example (HTTP URL):

```dvc
$ dvc run -n download_data \
-d https://data.dvc.org/get-started/data.xml \
-o data.xml \
wget https://data.dvc.org/get-started/data.xml -O data.xml
$ dvc stage add -n download_data \
-d https://data.dvc.org/get-started/data.xml \
-o data.xml \
wget https://data.dvc.org/get-started/data.xml -O data.xml

$ dvc repro
```

`dvc import-url` generates an _import `.dvc` file_ and `dvc run` a regular stage
(in `dvc.yaml`).
`dvc import-url` generates an _import `.dvc` file_ and `dvc stage add` a regular
stage (in `dvc.yaml`).
Comment on lines +124 to +125
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💅 and -> while ?


## Options

Expand Down Expand Up @@ -297,10 +299,12 @@ $ pip install -r src/requirements.txt
</details>

```dvc
$ dvc run -n prepare \
-d src/prepare.py -d data/data.xml \
-o data/prepared \
python src/prepare.py data/data.xml
$ dvc stage add -n prepare \
-d src/prepare.py -d data/data.xml \
-o data/prepared \
python src/prepare.py data/data.xml

$ dvc repro
Running command:
python src/prepare.py data/data.xml
...
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ does not change directories in your terminal).
- Copy data files or dataset directories for modeling into the repository, and
track them with DVC using the `dvc add` command.
- Process the data with your own source code, using `dvc.yaml` and/or the
`dvc run` command, specifying further <abbr>outputs</abbr> that should also be
tracked by DVC after the code is executed.
`dvc stage add` command to specify further <abbr>outputs</abbr> that should
also be tracked by DVC, and executing the code using `dvc repro`.
Comment on lines 17 to +19
Copy link
Contributor

@jorgeorpinel jorgeorpinel Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should keep this simpler by not mentioning stage add. It's linked from the dvc.yaml guide anyway, and from what I remember editing that manually is the default recommended way to create stages since DVC 1.x.

- Sharing a <abbr>DVC repository</abbr> with the codified data
[pipeline](/doc/command-reference/dag) will not include the project's
<abbr>cache</abbr>. Use [remote storage](/doc/command-reference/remote) and
Expand Down
6 changes: 3 additions & 3 deletions content/docs/command-reference/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,9 @@ include:
automation like running a data pipeline using `cron`.

In this mode, DVC features related to versioning are not available. For example
automatic creation and updating of `.gitignore` files on `dvc add` or `dvc run`,
as well as `dvc diff` and `dvc metrics diff`, which require Git revisions to
compare.
automatic creation and updating of `.gitignore` files on `dvc add` or
`dvc stage add`, as well as `dvc diff` and `dvc metrics diff`, which require Git
revisions to compare.

DVC sets the `core.no_scm` config option value to `true` in the DVC
[config](/doc/command-reference/config) when initialized this way. This means
Expand Down
10 changes: 6 additions & 4 deletions content/docs/command-reference/metrics/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,14 @@ all the current metrics (without comparisons).

## Examples

Start by creating a metrics file and commit it (see the `-M` option of `dvc run`
for more details):
Start by creating a metrics file and commit it (see the `-M` option of
`dvc stage add` for more details):

```dvc
$ dvc run -n eval -M metrics.json \
'echo {"AUC": 0.9643, "TP": 527} > metrics.json'
$ dvc stage add -n eval -M metrics.json \
'echo {"AUC": 0.9643, "TP": 527} > metrics.json'

$ dvc repro

$ cat metrics.json
{"AUC": 0.9643, "TP": 527}
Expand Down
18 changes: 10 additions & 8 deletions content/docs/command-reference/metrics/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ positives, etc.

This type of metrics files are typically generated by user data processing code,
and are tracked using the `-m` (`--metrics`) and `-M` (`--metrics-no-cache`)
options of `dvc run`.
options of `dvc stage add`.

In contrast to `dvc plots`, these metrics should be stored in hierarchical
files. Unlike its `dvc plots` counterpart, `dvc metrics diff` can report the
Expand Down Expand Up @@ -64,9 +64,9 @@ stages:
```

> `cache: false` above specifies that `summary.json` is not tracked or
> <abbr>cached</abbr> by DVC (`-M` option of `dvc run`). These metrics files are
> normally committed with Git instead. See `dvc.yaml` for more information on
> the file format above.
> <abbr>cached</abbr> by DVC (`-M` option of `dvc stage add`). These metrics
> files are normally committed with Git instead. See `dvc.yaml` for more
> information on the file format above.

### Supported file formats

Expand Down Expand Up @@ -106,13 +106,15 @@ First, let's imagine we have a simple [stage](/doc/command-reference/run) that
produces an `eval.json` metrics file:

```dvc
$ dvc run -n evaluate -d code/evaluate.py -M eval.json \
python code/evaluate.py
$ dvc stage add -n evaluate -d code/evaluate.py -M eval.json \
python code/evaluate.py

$ dvc repro
```

> `-M` (`--metrics-no-cache`) tells DVC to mark `eval.json` as a metrics file,
> without tracking it directly (You can track it with Git). See `dvc run` for
> more info.
> without tracking it directly (You can track it with Git). See `dvc stage add`
> for more info.

Now let's print metrics values that we are tracking in this
<abbr>project</abbr>, using `dvc metrics show`:
Expand Down
10 changes: 5 additions & 5 deletions content/docs/command-reference/params/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ repository history. The differences shown by this command include the old and
new param values, along with the param name.

> Parameter dependencies are defined in the `params` field of `dvc.yaml` (e.g.
> with the the `-p` (`--params`) option of `dvc run`).
> with the the `-p` (`--params`) option of `dvc stage add`).

Without arguments, `dvc params diff` compares parameters currently present in
the <abbr>workspace</abbr> (uncommitted changes) with the latest committed
Expand Down Expand Up @@ -95,10 +95,10 @@ Define a pipeline [stage](/doc/command-reference/run) with parameter
dependencies:

```dvc
$ dvc run -n train \
-d train.py -d users.csv -o model.pkl \
-p lr,train \
python train.py
$ dvc stage add -n train \
-d train.py -d users.csv -o model.pkl \
-p lr,train \
python train.py
```

Let's now print parameter values that we are tracking in this
Expand Down
38 changes: 19 additions & 19 deletions content/docs/command-reference/params/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ dependencies: _parameters_. They usually have simple names like `epochs`,
`learning-rate`, `batch_size`, etc.

To start tracking parameters, list them under the `params` field of `dvc.yaml`
stages (manually or with the the `-p`/`--params` option of `dvc run`). For
stages (manually or with the the `-p`/`--params` option of `dvc stage add`). For
example:

```yaml
Expand Down Expand Up @@ -97,14 +97,14 @@ process:
bow: 15000
```

Using `dvc run`, define a [stage](/doc/command-reference/run) that depends on
params `lr`, `layers`, and `epochs` from the params file above. Full paths
Using `dvc stage add`, define a [stage](/doc/command-reference/run) that depends
on params `lr`, `layers`, and `epochs` from the params file above. Full paths
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
should be used to specify `layers` and `epochs` from the `train` group:

```dvc
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p lr,train.epochs,train.layers \
python train.py
$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
-p lr,train.epochs,train.layers \
python train.py
```

> Note that we could use the same parameter addressing with JSON, TOML, or
Expand Down Expand Up @@ -147,9 +147,9 @@ Alternatively, the entire group of parameters `train` can be referenced, instead
of specifying each of the params separately:

```dvc
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p lr,train \
python train.py
$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
-p lr,train \
python train.py
```

```yaml
Expand All @@ -161,12 +161,12 @@ params:

In the examples above, the default parameters file name `params.yaml` was used.
Note that this file name can be redefined using a prefix in the `-p` argument of
`dvc run`. In our case:
`dvc stage add`. In our case:

```dvc
$ dvc run -n train -d train.py -d logs/ -o users.csv -f \
-p parse_params.yaml:threshold,classes_num \
python train.py
$ dvc stage add -n train -d train.py -d logs/ -o users.csv -f \
-p parse_params.yaml:threshold,classes_num \
python train.py
```

## Examples: Print all parameters
Expand Down Expand Up @@ -234,9 +234,9 @@ The following [stage](/doc/command-reference/run) depends on params `BOOL`,
`INT`, as well as `TrainConfig`'s `EPOCHS` and `layers`:

```dvc
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p params.py:BOOL,INT,TrainConfig.EPOCHS,TrainConfig.layers \
python train.py
$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
-p params.py:BOOL,INT,TrainConfig.EPOCHS,TrainConfig.layers \
python train.py
```

Resulting `dvc.yaml` and `dvc.lock` files (notice the `params` lists):
Expand Down Expand Up @@ -283,7 +283,7 @@ can be referenced
supported), instead of the parameters in it:

```dvc
$ dvc run -n train -d train.py -d users.csv -o model.pkl \
-p params.py:BOOL,INT,TestConfig \
python train.py
$ dvc stage add -n train -d train.py -d users.csv -o model.pkl \
-p params.py:BOOL,INT,TestConfig \
python train.py
```
8 changes: 4 additions & 4 deletions content/docs/command-reference/plots/modify.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ plots are generated with `dvc plot show` or `dvc plot diff`. This command sets
(or unsets) default display properties for a specific metrics file.

The path to the metrics file `target` is required. It must be listed in a
`dvc.yaml` file (see the `--plots` option of `dvc run`). `dvc plots modify` adds
the display properties to `dvc.yaml`.
`dvc.yaml` file (see the `--plots` option of `dvc stage add`).
`dvc plots modify` adds the display properties to `dvc.yaml`.

Property names are passed as [options](#options) to this command (prefixed with
`--`). These are based on the [Vega-Lite](https://vega.github.io/vega-lite/)
Expand Down Expand Up @@ -134,8 +134,8 @@ plots:

## Example: Template change

_dvc run --plots file.csv ..._ command assign the default template that needs to
be changed in many cases. A simple command changes the template:
_dvc stage add --plots file.csv ..._ command assign the default template that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should render correctly now:

Suggested change
_dvc stage add --plots file.csv ..._ command assign the default template that
`dvc stage add --plots file.csv ...` assign the default template that

It's just a bit unclear why we're referring to such a specific stage add command at the opening of this example...

needs to be changed in many cases. A simple command changes the template:

```dvc
$ dvc plots modify classes.csv --template confusion
Expand Down
37 changes: 15 additions & 22 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ are run one after the other in the order they are defined. The failure of any
command will halt the remaining stage execution, and raises an error.

> Pipeline stages are defined in `dvc.yaml` (either manually or by using
> `dvc run`) while initial data dependencies can be registered with `dvc add`.
> `dvc stage add`) while initial data dependencies can be registered with
> `dvc add`.

`dvc repro` is similar to [Make](https://www.gnu.org/software/make/) in software
build automation, but DVC captures build requirements
Expand Down Expand Up @@ -137,8 +138,8 @@ up-to-date and only execute the final stage.
`dvc commit` to finish the operation.

- `-m`, `--metrics` - show metrics after reproduction. The target pipelines must
have at least one metrics file defined either with `dvc metrics` or by the
`-M` or `-m` options of `dvc run`
have at least one [metrics](/doc/command-reference/metrics) file defined in
`dvc.yaml`.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

- `--dry` - only print the commands that would be executed without actually
executing the commands.
Expand Down Expand Up @@ -170,10 +171,10 @@ up-to-date and only execute the final stage.
stages (`A` and below) depend on `requirements.txt`, we can specify it in `A`,
and omit it in `B` and `C`.

Like with the `--force` option on `dvc run`, this is a way to force-execute
stages without changes. This can also be useful for pipelines containing
stages that produce non-deterministic (semi-random) outputs, where outputs can
vary on each execution, meaning the cache cannot be trusted for such stages.
This is a way to force-execute stages without changes. This can also be useful
for pipelines containing stages that produce non-deterministic (semi-random)
outputs, where outputs can vary on each execution, meaning the cache cannot be
trusted for such stages.

- `--downstream` - only execute the stages after the given `targets` in their
corresponding pipelines, including the target stages themselves. This option
Expand Down Expand Up @@ -213,11 +214,13 @@ best
And runs a few simple transformations to filter and count numbers:

```dvc
$ dvc run -n filter -d text.txt -o numbers.txt \
$ dvc stage add -n filter -d text.txt -o numbers.txt \
"cat text.txt | egrep '[0-9]+' > numbers.txt"

$ dvc run -n count -d numbers.txt -d process.py -M count.txt \
$ dvc stage add -n count -d numbers.txt -d process.py -M count.txt \
"python process.py numbers.txt > count.txt"

$ dvc repro
```

Where `process.py` is a script that, for simplicity, just prints the number of
Expand All @@ -232,7 +235,7 @@ with open(sys.argv[1], 'r') as f:
print(num_lines)
```

The result of executing these `dvc run` commands should look like this:
The result of executing `dvc repro` should look like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script is about num_lines but the result shows the tree output. I think we need the exact result first, and an explanation for tree in a sentence.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iesahin! PTAL.


```dvc
$ tree
Expand All @@ -248,18 +251,8 @@ $ tree
You may want to check the contents of `dvc.lock` and `count.txt` for later
reference.

Ok, now let's run `dvc repro`:

```dvc
$ dvc repro
Stage 'filter' didn't change, skipping
Stage 'count' didn't change, skipping
Data and pipelines are up to date.
```

It makes sense, since we haven't changed any of the dependencies of this
pipeline (`text.txt` and `process.py`). Now, let's imagine we want to print a
description and we add this line to the `process.py`:
Now, let's imagine we want to print a description and we add this line to the
`process.py`:

```python
...
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/status.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ description_, as detailed below:

- _always changed_ means that this is a `.dvc` file with no dependencies (see
`dvc add`) or that the stage in `dvc.yaml` has the `always_changed: true`
value set (see `--always-changed` option in `dvc run`).
value set (see `--always-changed` option in `dvc stage add`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually these can now link directly to the option anchor like this 🙂

Suggested change
value set (see `--always-changed` option in `dvc stage add`).
value set (see `dvc stage add --always-changed`).

Per #3140

But we prob. need a separate issue to update these

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted to #3236


- _changed deps_ or _changed outs_ means that there are changes in dependencies
or outputs tracked by the stage or `.dvc` file. Depending on the use case,
Expand Down