Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to link arbitrary strings in code blocks #1576

Merged
merged 7 commits into from
Jul 15, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions content/docs/api-reference/get_url.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,10 @@ Returns the URL string of the storage location (in a
specified by its `path` in a `repo` (<abbr>DVC project</abbr>), is stored.

The URL is formed by reading the project's
[remote configuration](/doc/command-reference/config#remote) and the
[`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file) or
[`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files) where the
given `path` is found (`outs` field). The schema of the URL returned depends on
the [type](/doc/command-reference/remote/add#supported-storage-types) of the
[remote configuration](/doc/command-reference/config#remote) and the `dvc.yaml`
or `.dvc` file where the given `path` is found (`outs` field). The schema of the
URL returned depends on the
[type](/doc/command-reference/remote/add#supported-storage-types) of the
`remote` used (see the [Parameters](#parameters) section).

If the target is a directory, the returned URL will end in `.dir`. Refer to
Expand Down
46 changes: 20 additions & 26 deletions content/docs/command-reference/add.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# add

Track data files or directories with DVC, by creating a corresponding
[`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files).
Track data files or directories with DVC, by creating a corresponding `.dvc`
file.

## Synopsis

Expand All @@ -16,9 +16,8 @@ positional arguments:
## Description

The `dvc add` command is analogous to `git add`, in that it makes DVC aware of
the target data, in order to start versioning it. It creates a
[`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files) to track the
added data.
the target data, in order to start versioning it. It creates a `.dvc` file to
track the added data.

This command can be used to
[version control](/doc/use-cases/versioning-data-and-model-files) large files,
Expand All @@ -43,27 +42,24 @@ each one:
for more details.)
3. Attempt to replace the file with a link to the cached data (more details on
file linking further down).
4. Create a corresponding
[`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files) to track
the file, using its path and hash to identify the cached data. The `.dvc`
file lists the DVC-tracked file as an <abbr>output</abbr> (`outs` field).
Unless the `--file` option is used, the `.dvc` file name generated by default
is `<file>.dvc`, where `<file>` is the file name of the first target.
4. Create a corresponding `.dvc` file to track the file, using its path and hash
to identify the cached data. The `.dvc` file lists the DVC-tracked file as an
<abbr>output</abbr> (`outs` field). Unless the `--file` option is used, the
`.dvc` file name generated by default is `<file>.dvc`, where `<file>` is the
file name of the first target.
5. Add the `targets` to `.gitignore` in order to prevent them from being
committed to the Git repository (unless `dvc init --no-scm` was used when
initializing the DVC project).
6. Instructions are printed showing `git` commands for adding the files, if
appropriate.

Summarizing, the result is that the target data is replaced by small
[`.dvc` files](/doc/user-guide/dvc-files-and-directories#dvc-files) that can be
easily tracked with Git.
Summarizing, the result is that the target data is replaced by small `.dvc`
files that can be easily tracked with Git.

> Note that `.dvc` files can be considered _orphan stages_, because they have no
> <abbr>dependencies</abbr>, only outputs. These are treated as _always changed_
> by `dvc status` and `dvc repro`, which always executes them. See
> [`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file) to learn
> more about stages.
> by `dvc status` and `dvc repro`, which always executes them. See `dvc.yaml` to
> learn more about stages.

To avoid adding files inside a directory accidentally, you can add the
corresponding [patterns](/doc/user-guide/dvcignore) in a `.dvcignore` file.
Expand All @@ -78,8 +74,8 @@ large files. DVC also supports other link types for use on file systems without
### Tracking directories

A `dvc add` target can be an individual file or a directory. In the latter case,
a [`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files) is created
for the top of the directory (with default name `<dir_name>.dvc`).
a `.dvc` file is created for the top of the directory (with default name
`<dir_name>.dvc`).

Every file in the hierarchy is added to the cache (unless the `--no-commit`
option is used), but DVC does not produce individual `.dvc` files for each file
Expand Down Expand Up @@ -139,9 +135,8 @@ To track the changes with git, run:
git add .gitignore data.xml.dvc
```

As indicated above, a
[`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files) has been
created for `data.xml`. Let's explore the result:
As indicated above, a `.dvc` file has been created for `data.xml`. Let's explore
the result:

```dvc
$ tree
Expand Down Expand Up @@ -192,10 +187,9 @@ Tracking a directory with DVC as simple as with a single file:
$ dvc add pics
```

There are no [`.dvc` files](/doc/user-guide/dvc-files-and-directories#dvc-files)
generated within this directory structure to match each image, but the image
files are all <abbr>cached</abbr>. A single `pics.dvc` file is generated for the
top-level directory, and it contains:
There are no `.dvc` files generated within this directory structure to match
each image, but the image files are all <abbr>cached</abbr>. A single `pics.dvc`
file is generated for the top-level directory, and it contains:

```yaml
outs:
Expand Down
5 changes: 2 additions & 3 deletions content/docs/command-reference/dag.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# dag

Visualize the pipeline(s) in
[`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvclock-file) as one or
more graph(s) of connected [stages](/doc/command-reference/run).
Visualize the pipeline(s) in `dvc.yaml` as one or more graph(s) of connected
[stages](/doc/command-reference/run).

## Synopsis

Expand Down
19 changes: 7 additions & 12 deletions content/docs/command-reference/fetch.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,7 @@ of the project, but without placing them in the <abbr>workspace</abbr>. This
makes the data files available for linking (or copying) into the workspace.
(Refer to [dvc config cache.type](/doc/command-reference/config#cache).) Along
with `dvc checkout`, it's performed automatically by `dvc pull` when the target
[`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file) or
[`.dvc`](/doc/user-guide/dvc-files-and-directories#dvc-files) files are not
already in the cache:
`dvc.yaml` or `.dvc` files are not already in the cache:

```
Controlled files Commands
Expand Down Expand Up @@ -199,11 +197,10 @@ Note that the `.dvc/cache` directory was created and populated.
> for more info.

Used without arguments (as above), `dvc fetch` downloads all assets needed by
all [`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file) and
[`.dvc`](/doc/user-guide/dvc-files-and-directories#dvc-files) files in the
current branch, including for directories. The hash values
`3863d0e317dee0a55c4e59d2ec0eef33` and `42c7025fc0edeb174069280d17add2d4`
correspond to the `model.pkl` file and `data/features/` directory, respectively.
all `dvc.yaml` and `.dvc` files in the current branch, including for
directories. The hash values `3863d0e317dee0a55c4e59d2ec0eef33` and
`42c7025fc0edeb174069280d17add2d4` correspond to the `model.pkl` file and
`data/features/` directory, respectively.

Let's now link files from the cache to the workspace with:

Expand All @@ -217,8 +214,7 @@ $ dvc checkout
> follow this example if you tried the previous one (**Default behavior**).

`dvc fetch` only downloads the data files of a specific stage when the
corresponding [`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files)
(command target) is specified:
corresponding `.dvc` file (command target) is specified:

```dvc
$ dvc fetch prepare.dvc
Expand Down Expand Up @@ -283,8 +279,7 @@ $ tree .dvc/cache
└── a9c512fda11293cfee7617b66648dc
```

Fetching using `--with-deps` starts with the target
[`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files) (`train.dvc`)
Fetching using `--with-deps` starts with the target `.dvc` file (`train.dvc`)
and searches backwards through its pipeline for data to download into the
project's cache. All the data for the second and third stages ("featurize" and
"train") has now been downloaded to the cache. We could now use `dvc checkout`
Expand Down
15 changes: 5 additions & 10 deletions content/docs/command-reference/get.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,7 @@ downloading, DVC will try to copy the target data from its <abbr>cache</abbr>).
The `path` argument is used to specify the location of the target to be
downloaded within the source repository at `url`. `path` can specify any file or
directory in the source repo, including those tracked by DVC, or by Git. Note
that DVC-tracked targets should be found in a
[`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file) or
[`.dvc`](/doc/user-guide/dvc-files-and-directories#dvc-files) file of the
that DVC-tracked targets should be found in a `dvc.yaml` or `.dvc` file of the
project.

⚠️ The project should have a default
Expand Down Expand Up @@ -96,9 +94,8 @@ model.pkl

Note that the `model.pkl` file doesn't actually exist in the
[root directory](https://github.com/iterative/example-get-started/tree/master/)
of the source Git repo. Instead, it's exported in the
[`dvc.yaml`](https://github.com/iterative/example-get-started/blob/master/dvc.yaml)
file as an output of the `train` stage (in the `outs` field). DVC then
of the source Git repo. Instead, it's exported in the `dvc.yaml` file as an
output of the `train` stage (in the `outs` field). DVC then
[pulls](/doc/command-reference/pull) the file from the default
[remote](/doc/command-reference/remote) of the source DVC project (found in its
[config file](https://github.com/iterative/example-get-started/blob/master/.dvc/config)).
Expand Down Expand Up @@ -182,10 +179,8 @@ The `model.monograms.pkl` file now contains the older version of the model. To
get the most recent one, we use a similar command, but with
`-o model.bigrams.pkl` and `--rev bigrams-experiment` (or even without `--rev`
since that tag has the latest model version anyway). In fact, in this case using
`dvc pull` with the corresponding
[`.dvc` files](/doc/user-guide/dvc-files-and-directories#dvc-files) should
suffice, downloading the file as just `model.pkl`. We can then rename it to make
its variant explicit:
`dvc pull` with the corresponding `.dvc` files should suffice, downloading the
file as just `model.pkl`. We can then rename it to make its variant explicit:

```dvc
$ dvc pull train.dvc
Expand Down
15 changes: 5 additions & 10 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@

Download a file or directory from a supported URL (for example `s3://`,
`ssh://`, and other protocols) into the <abbr>workspace</abbr>, and track
changes in the remote data source. Creates a
[`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files).
changes in the remote data source. Creates a `.dvc` file.

> See `dvc import` to download and tack data/model files or directories from
> other <abbr>DVC repositories</abbr> (e.g. hosted on Github).
Expand Down Expand Up @@ -43,8 +42,7 @@ while `out` can be used to specify the directory and/or file name desired for
the downloaded data. If an existing directory is specified, the file or
directory will be placed inside.

[`.dvc` files](/doc/user-guide/dvc-files-and-directories#dvc-files) support
references to data in an external location, see
`.dvc` files support references to data in an external location, see
[External Dependencies](/doc/user-guide/external-dependencies). In such an
import `.dvc` file, the `deps` field stores the remote URL, and the `outs` field
contains the corresponding local path in the <abbr>workspace</abbr>. It records
Expand Down Expand Up @@ -109,10 +107,8 @@ $ dvc run -n download_data \
wget https://data.dvc.org/get-started/data.xml -O data.xml
```

`dvc import-url` generates an import stage
[`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files) and `dvc run`
a regular stage (in
[`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file)).
`dvc import-url` generates an import stage `.dvc` file and `dvc run` a regular
stage (in `dvc.yaml`).

## Options

Expand Down Expand Up @@ -189,8 +185,7 @@ The `etag` field in the `.dvc` file contains the
If the remote file changes, its ETag will be different. This metadata allows DVC
to determine whether it's necessary to download it again.

> See [`.dvc` files](/doc/user-guide/dvc-files-and-directories#dvc-files) for
> more details on the format above.
> See `.dvc` files for more details on the format above.

You may want to get out of and remove the `example-get-started/` directory after
trying this example (especially if trying out the following one).
Expand Down
36 changes: 14 additions & 22 deletions content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
# import

Download a file or directory tracked by DVC or by Git into the
<abbr>workspace</abbr>. It also creates a
[`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files) with
information about the data source, which can later be used to
[update](/doc/command-reference/update) the import.
<abbr>workspace</abbr>. It also creates a `.dvc` file with information about the
data source, which can later be used to [update](/doc/command-reference/update)
the import.

> See also our `dvc.api.open()` Python API function.

Expand Down Expand Up @@ -43,9 +42,7 @@ downloading, DVC will try to copy the target data from its <abbr>cache</abbr>).
The `path` argument is used to specify the location of the target to be
downloaded within the source repository at `url`. `path` can specify any file or
directory in the source repo, including those tracked by DVC, or by Git. Note
that DVC-tracked targets should be found in a
[`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file) or
[`.dvc`](/doc/user-guide/dvc-files-and-directories#dvc-files) file of the
that DVC-tracked targets should be found in a `dvc.yaml` or `.dvc` file of the
project.

⚠️ The project should have a default
Expand Down Expand Up @@ -114,11 +111,9 @@ Importing 'data/data.xml ([email protected]:iterative/example-get-started)'
```

In contrast with `dvc get`, this command doesn't just download the data file,
but it also creates an import stage
([`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files)) with a link
to the data source (as explained in the description above). (This import stage
can later be used to [update](/doc/command-reference/update) the import.) Check
`data.xml.dvc`:
but it also creates an import stage (`.dvc` file) with a link to the data source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for self: replace all these "import stage" term instances in a separate PR.

(as explained in the description above). (This import stage can later be used to
[update](/doc/command-reference/update) the import.) Check `data.xml.dvc`:

```yaml
md5: 7de90e7de7b432ad972095bc1f2ec0f8
Expand Down Expand Up @@ -155,9 +150,8 @@ Importing
-> 'cats-dogs'
```

When using this option, the import stage
([`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files)) will also
have a `rev` subfield under `repo`:
When using this option, the import stage (`.dvc` file) will also have a `rev`
subfield under `repo`:

```yaml
deps:
Expand Down Expand Up @@ -187,10 +181,9 @@ $ dvc update --rev cats-dogs-v2
If you take a look at our
[dataset registry](https://github.com/iterative/dataset-registry)
<abbr>project</abbr>, you'll see that it's organized into different directories
such as `tutorial/ver` and `use-cases/`, and these contain
[`.dvc` files](/doc/user-guide/dvc-files-and-directories#dvc-files) that track
different datasets. Given this simple structure, its data files can be easily
shared among several other projects using `dvc get` and `dvc import`. For
such as `tutorial/ver` and `use-cases/`, and these contain `.dvc` files that
track different datasets. Given this simple structure, its data files can be
easily shared among several other projects using `dvc get` and `dvc import`. For
example:

```dvc
Expand Down Expand Up @@ -245,9 +238,8 @@ Importing ...

> Note that Git-tracked files can be imported from DVC repos as well.

The file is imported, and along with it, an import stage
([`.dvc` file](/doc/user-guide/dvc-files-and-directories#dvc-files)) is created.
Check `it-standards.csv.dvc`:
The file is imported, and along with it, an import stage (`.dvc` file) is
created. Check `it-standards.csv.dvc`:

```yaml
deps:
Expand Down
13 changes: 5 additions & 8 deletions content/docs/command-reference/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,9 @@ sub-projects to mitigate the issues of initializing in the Git repository root:
different remote storages, for example, for different sub-projects, etc.

- Not enough isolation/granularity - commands like `dvc pull`, `dvc checkout`,
and others analyze the whole repository to look for
[`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file) or
[`.dvc`](/doc/user-guide/dvc-files-and-directories#dvc-files) files to
download files and directories, to reproduce <abbr>pipelines</abbr>, etc. It
can be expensive in the large repositories with a lot of projects.
and others analyze the whole repository to look for `dvc.yaml` or `.dvc` files
to download files and directories, to reproduce <abbr>pipelines</abbr>, etc.
It can be expensive in the large repositories with a lot of projects.

- Not enough isolation/granularity - commands like `dvc metrics diff`, `dvc dag`
and others by default dump all the metrics, all the pipelines, etc.
Expand Down Expand Up @@ -125,9 +123,8 @@ include:

- SCM other than Git is being used. Even though there are DVC features that
require DVC to be run in the Git repo, DVC can work well with other version
control systems. Since DVC relies on simple
[`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file) files to
manage <abbr>pipelines</abbr>, data, etc, they can be added into any SCM thus
control systems. Since DVC relies on simple `dvc.yaml` files to manage
<abbr>pipelines</abbr>, data, etc, they can be added into any SCM thus
providing large data files and directories versioning.

- There is no need to keep the history at all, e.g. having a deployment
Expand Down
13 changes: 5 additions & 8 deletions content/docs/command-reference/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,9 @@ positional arguments:
DVC, by effectively replacing data files, models, directories with `.dvc` files
(`.dvc`), hides actual locations and names. This means that you don't see data
files when you browse a <abbr>DVC repository</abbr> on Git hosting (e.g.
Github), you just see the
[`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file) and
[`.dvc`](/doc/user-guide/dvc-files-and-directories#dvc-files) files. This makes
it hard to navigate the project to find <abbr>data artifacts</abbr> for use with
`dvc get`, `dvc import`, or `dvc.api`.
Github), you just see the `dvc.yaml` and `.dvc` files. This makes it hard to
navigate the project to find <abbr>data artifacts</abbr> for use with `dvc get`,
`dvc import`, or `dvc.api`.

`dvc list` prints a virtual view of a DVC repository, as if files and
directories [tracked by DVC](/doc/use-cases/versioning-data-and-model-files)
Expand Down Expand Up @@ -96,9 +94,8 @@ src
If you open the
[example-get-started](https://github.com/iterative/example-get-started)
project's page, you will see a similar list but the `model.pkl` file. It's
tracked by DVC and not visible to Git. It's exported in the
[`dvc.yaml`](https://github.com/iterative/example-get-started/blob/master/dvc.yaml)
file as an output of the `train` stage (in the `outs` field).
tracked by DVC and not visible to Git. It's exported in the `dvc.yaml` file as
an output of the `train` stage (in the `outs` field).

We can now, for example, download the model file with:

Expand Down
6 changes: 2 additions & 4 deletions content/docs/command-reference/metrics/diff.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,8 @@ The differences shown by this command include the new value, and numeric
difference (delta) from the previous value of metrics (rounded to 5 digits
precision). They're calculated between two commits (hash, branch, tag, or any
[Git revision](https://git-scm.com/docs/revisions)) for all metrics in the
<abbr>project</abbr>, found by examining all of the
[`dvc.yaml`](/doc/user-guide/dvc-files-and-directories#dvcyaml-file) and
[`.dvc`](/doc/user-guide/dvc-files-and-directories#dvc-files) files in both
versions.
<abbr>project</abbr>, found by examining all of the `dvc.yaml` and `.dvc` files
in both versions.

Another way to display metrics is the `dvc metrics show` command, which just
lists all the current metrics without comparisons.
Expand Down
Loading