Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: replace dvc pipeline with dvc dag #1383

Merged
merged 8 commits into from
Jun 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions config/prismjs/dvc-commands.js
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,6 @@ module.exports = [
'plots modify',
'plots diff',
'plots',
'pipeline show',
'pipeline list',
'pipeline',
'move',
'metrics show',
'metrics diff',
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/checkout.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ progress made by the checkout.

There are two methods to restore a file missing from the cache, depending on the
situation. In some cases a pipeline must be reproduced (using `dvc repro`) to
regenerate its outputs (see also `dvc pipeline`). In other cases the cache can
be pulled from remote storage using `dvc pull`.
regenerate its outputs (see also `dvc dag`). In other cases the cache can be
pulled from remote storage using `dvc pull`.

## Options

Expand Down
108 changes: 108 additions & 0 deletions content/docs/command-reference/dag.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# dag

Show [stages](/doc/command-reference/run) in a pipeline that lead to the
specified stage. By default it lists
[DVC-files](/doc/user-guide/dvc-files-and-directories).

## Synopsis

```usage
usage: dvc dag [-h] [-q | -v] [--dot] [--full] [target]

positional arguments:
targets Stage or output to show pipeline for (optional)
Finds all stages in the workspace by default.
```

## Description

A data pipeline, in general, is a series of data processing
[stages](/doc/command-reference/run) (for example console commands that take an
input and produce an <abbr>output</abbr>). A pipeline may produce intermediate
data, and has a final result. Machine learning (ML) pipelines typically start a
with large raw datasets, include intermediate featurization and training stages,
and produce a final model, as well as accuracy
[metrics](/doc/command-reference/metrics).

In DVC, pipeline stages and commands, their data I/O, interdependencies, and
results (intermediate or final) are specified with `dvc add` and `dvc run`,
among other commands. This allows DVC to restore one or more pipelines of stages
interconnected by their dependencies and outputs later. (See `dvc repro`.)

> DVC builds a dependency graph
> ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) to do this.

`dvc dag` displays the stages of a pipeline up to the target stage. If `target`
is omitted, it will show the full project DAG.
Comment on lines +35 to +36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is it possible to have more than one DAG?
  • Should we call it always "pipeline" or always "DAG" or always "dependency graph"? We currently have a mix of all these everywhere 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 2 DAGs together is 1 DAG too, just with weakly connected components :)

Pipeline is a DAG in which all stages are somehow connected with each other, so it is not quite that. We could call it pipelineS, but dvc add foo is not quite a pipeline strictly speaking. dependency graph sounds as the most correct term, but DAG is a synonym, hence the https://dagshub.com/ :) So we could and probably should use them interchangeably.

This doc is adapted from the old dvc pipeline show so it does suffer from some legacy sentence structure :(

Comment on lines +35 to +36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shuold start the description with this paragraph though. Since it's now a specific command after all 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, the short intro is enough. That one needs updating though... Will do.


## Options

- `--dot` - show DAG in
[DOT](<https://en.wikipedia.org/wiki/DOT_(graph_description_language)>)
format. It can be passed to third party visualization utilities.

- `--full` - show full DAG that the `target` belongs too, instead of showing the
part that consists only of the target ancestors.

- `-h`, `--help` - prints the usage/help message, and exit.

- `-q`, `--quiet` - do not write anything to standard output. Exit with 0 if no
problems arise, otherwise 1.

- `-v`, `--verbose` - displays detailed tracing information.

## Paging the output

This command's output is automatically piped to
[Less](<https://en.wikipedia.org/wiki/Less_(Unix)>), if available in the
terminal. (The exact command used is `less --chop-long-lines --clear-screen`.)
If `less` is not available (e.g. on Windows), the output is simply printed out.

> It's also possible to
> [enable Less paging on Windows](/doc/user-guide/running-dvc-on-windows#enabling-paging-with-less).

### Providing a custom pager

It's possible to override the default pager via the `DVC_PAGER` environment
variable. For example, the following command will replace the default pager with
[`more`](<https://en.wikipedia.org/wiki/More_(command)>), for a single run:

```dvc
$ DVC_PAGER=more dvc dag
```

For a persistent change, define `DVC_PAGER` in the shell configuration. For
example in Bash, we could add the following line to `~/.bashrc`:

```bash
export DVC_PAGER=more
```

## Examples

Visualize DVC pipeline:

```dvc
$ dvc dag
+---------+
| prepare |
+---------+
*
*
*
+-----------+
| featurize |
+-----------+
** **
** *
* **
+-------+ *
| train | **
+-------+ *
** **
** **
* *
+----------+
| evaluate |
+----------+
```
5 changes: 2 additions & 3 deletions content/docs/command-reference/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,8 @@ sub-projects to mitigate the issues of initializing in the Git repository root:
download files and directories, to reproduce <abbr>pipelines</abbr>, etc. It
can be expensive in the large repositories with a lot of projects.

- Not enough isolation/granularity - commands like `dvc metrics diff`,
`dvc pipeline show` and others by default dump all the metrics, all the
pipelines, etc.
- Not enough isolation/granularity - commands like `dvc metrics diff`, `dvc dag`
and others by default dump all the metrics, all the pipelines, etc.

#### How does it affect DVC commands?

Expand Down
47 changes: 0 additions & 47 deletions content/docs/command-reference/pipeline/index.md

This file was deleted.

41 changes: 0 additions & 41 deletions content/docs/command-reference/pipeline/list.md

This file was deleted.

156 changes: 0 additions & 156 deletions content/docs/command-reference/pipeline/show.md

This file was deleted.

4 changes: 2 additions & 2 deletions content/docs/command-reference/repro.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,8 @@ only execute the final stage.
The stage is only executed if the user types "y".

- `-p`, `--pipeline` - reproduce the entire pipelines that the stage file
`targets` belong to. Use `dvc pipeline show <target>.dvc` to show the parent
pipeline of a target stage.
`targets` belong to. Use `dvc dag <target>` to show the parent pipeline of a
target stage.

- `-P`, `--all-pipelines` - reproduce all pipelines, for all the stage files
present in `DVC` repository.
Expand Down
Loading