Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

term: DVC-file -> .dvc file from Utkarsh work (2nd chunk) #1403

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Download a file or directory from a supported URL (for example `s3://`,
`ssh://`, and other protocols) into the <abbr>workspace</abbr>, and track
changes in the remote data source. Creates a DVC-file.
changes in the remote data source. Creates a `.dvc` file.

> See `dvc import` to download and tack data/model files or directories from
> other <abbr>DVC repositories</abbr> (e.g. hosted on Github).
Expand Down Expand Up @@ -41,11 +41,11 @@ while `out` can be used to specify the directory and/or file name desired for
the downloaded data. If an existing directory is specified, the file or
directory will be placed inside.

[DVC-files](/doc/user-guide/dvc-file-format) support references to data in an
[`.dvc` files](/doc/user-guide/dvc-file-format) support references to data in an
external location, see
[External Dependencies](/doc/user-guide/external-dependencies). In such a
DVC-file, the `deps` field stores the remote URL, and the `outs` field contains
the corresponding local path in the <abbr>workspace</abbr>. It records enough
[External Dependencies](/doc/user-guide/external-dependencies). In such a `.dvc`
file, the `deps` field stores the remote URL, and the `outs` field contains the
corresponding local path in the <abbr>workspace</abbr>. It records enough
metadata about the imported data to enable DVC efficiently determining whether
the local copy is out of date.

Expand Down Expand Up @@ -102,7 +102,7 @@ $ dvc run -d https://example.com/path/to/data.csv \
wget https://example.com/path/to/data.csv -O data.csv
```

Both methods generate a [DVC-files](/doc/user-guide/dvc-file-format) with an
Both methods generate a [`.dvc` files](/doc/user-guide/dvc-file-format) with an
external dependency, but the one created by `dvc import-url` preserves the
connection to the data source. We call this an _import stage_.

Expand All @@ -113,9 +113,9 @@ up to date from the external data source.
## Options

- `-f <filename>`, `--file <filename>` - specify a path and/or file name for the
DVC-file created by this command (e.g. `-f stages/stage.dvc`). This overrides
the default file name: `<file>.dvc`, where `<file>` is the desired file name
of the imported data (`out`).
`.dvc` file created by this command (e.g. `-f stages/stage.dvc`). This
overrides the default file name: `<file>.dvc`, where `<file>` is the desired
file name of the imported data (`out`).

- `-h`, `--help` - prints the usage/help message, and exit.

Expand Down Expand Up @@ -167,7 +167,7 @@ To track the changes with git, run:
git add data.xml.dvc data/.gitignore
```

Let's take a look at the resulting stage file (DVC-file) `data.xml.dvc`:
Let's take a look at the resulting stage file (`.dvc` file) `data.xml.dvc`:

```yaml
md5: 61e80c38c1ce04ed2e11e331258e6d0d
Expand All @@ -183,7 +183,7 @@ outs:
persist: false
```

The `etag` field in the DVC-file contains the
The `etag` field in the `.dvc` file contains the
[ETag](https://en.wikipedia.org/wiki/HTTP_ETag) recorded from the HTTP request.
If the remote file changes, its ETag will be different. This metadata allows DVC
to determine whether its necessary to download it again.
Expand Down Expand Up @@ -241,7 +241,7 @@ outs:
persist: false
```

The DVC-file is nearly the same as in the previous example. The difference is
The `.dvc` file is nearly the same as in the previous example. The difference is
that the dependency (`deps`) now references the local file in the data store
directory we created previously. (Its `path` has the URL for the data store.)
And instead of an `etag` we have an `md5` hash value. We did this so its easy to
Expand Down Expand Up @@ -309,8 +309,8 @@ Data and pipelines are up to date.

In the data store directory, edit `data.xml`. It doesn't matter what you change,
as long as it remains a valid XML file, because any change will result in a
different dependency file hash (`md5`) in the import stage DVC-file. Once we do
so, we can run `dvc update` to make sure the import stage is up to date:
different dependency file hash (`md5`) in the import stage `.dvc` file. Once we
do so, we can run `dvc update` to make sure the import stage is up to date:

```dvc
$ dvc update data.xml.dvc
Expand Down
35 changes: 18 additions & 17 deletions content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Download a file or directory tracked by DVC or by Git into the
<abbr>workspace</abbr>. It also creates a
[DVC-file](/doc/user-guide/dvc-file-format) with information about the data
[`.dvc` file](/doc/user-guide/dvc-file-format) with information about the data
source, which can later be used to [update](/doc/command-reference/update) the
import.

Expand Down Expand Up @@ -44,7 +44,7 @@ The `path` argument is used to specify the location of the target to be
downloaded within the source repository at `url`. `path` can specify any file or
directory in the source repo, including those tracked by DVC, or by Git. Note
that DVC-tracked targets should be found in a
[DVC-file](/doc/user-guide/dvc-file-format) of the project.
[`.dvc` file](/doc/user-guide/dvc-file-format) of the project.

⚠️ The project should have a default
[DVC remote](/doc/command-reference/remote), containing the actual data for this
Expand All @@ -55,15 +55,16 @@ command to work.

After running this command successfully, the imported data is placed in the
current working directory (unless `-o` is used) with its original file name e.g.
`data.txt`. An _import stage_ (DVC-file) is also created in the same location,
extending the name of the imported data e.g. `data.txt.dvc` – similar to having
used `dvc run` to generate the data as a stage <abbr>output</abbr>.
`data.txt`. An _import stage_ (`.dvc` file) is also created in the same
location, extending the name of the imported data e.g. `data.txt.dvc` – similar
to having used `dvc run` to generate the data as a stage <abbr>output</abbr>.

DVC-files support references to data in an external DVC repository (hosted on a
Git server). In such a DVC-file, the `deps` field specifies the remote `url` and
data `path`, and the `outs` field contains the corresponding local path in the
<abbr>workspace</abbr>. It records enough metadata about the imported data to
enable DVC efficiently determining whether the local copy is out of date.
`.dvc` files support references to data in an external DVC repository (hosted on
a Git server). In such a `.dvc` file, the `deps` field specifies the remote
`url` and data `path`, and the `outs` field contains the corresponding local
path in the <abbr>workspace</abbr>. It records enough metadata about the
imported data to enable DVC efficiently determining whether the local copy is
out of date.

To actually
[track the data](https://dvc.org/doc/tutorials/get-started/data-versioning),
Expand Down Expand Up @@ -112,8 +113,8 @@ Importing 'data/data.xml ([email protected]:iterative/example-get-started)'

In contrast with `dvc get`, this command doesn't just download the data file,
but it also creates an import stage
([DVC-file](/doc/user-guide/dvc-file-format)) with a link to the data source (as
explained in the description above). (This import stage can later be used to
([`.dvc` file](/doc/user-guide/dvc-file-format)) with a link to the data source
(as explained in the description above). (This import stage can later be used to
[update](/doc/command-reference/update) the import.) Check `data.xml.dvc`:

```yaml
Expand Down Expand Up @@ -152,7 +153,7 @@ Importing
```

When using this option, the import stage
([DVC-file](/doc/user-guide/dvc-file-format)) will also have a `rev` subfield
([`.dvc` file](/doc/user-guide/dvc-file-format)) will also have a `rev` subfield
under `repo`:

```yaml
Expand All @@ -166,7 +167,7 @@ deps:

If `rev` is a Git branch or tag (where the underlying commit changes), the data
source may have updates at a later time. To bring it up to date if so (and
update `rev_lock` in the DVC-file), simply use `dvc update <stage>.dvc`. If
update `rev_lock` in the `.dvc` file), simply use `dvc update <stage>.dvc`. If
`rev` is a specific commit hash (does not change), `dvc update` without options
will not have an effect on the import stage. You may force-update it to a
different commit with `dvc update --rev`:
Expand All @@ -184,7 +185,7 @@ If you take a look at our
[dataset registry](https://github.com/iterative/dataset-registry)
<abbr>project</abbr>, you'll see that it's organized into different directories
such as `tutorial/ver` and `use-cases/`, and these contain
[DVC-files](/doc/user-guide/dvc-file-format) that track different datasets.
[`.dvc` files](/doc/user-guide/dvc-file-format) that track different datasets.
Given this simple structure, its data files can be easily shared among several
other projects using `dvc get` and `dvc import`. For example:

Expand All @@ -205,7 +206,7 @@ $ dvc import [email protected]:iterative/dataset-registry.git \
`dvc import` provides a better way to incorporate data files tracked in external
<abbr>DVC repositories</abbr> because it saves the connection between the
current project and the source repo. This means that enough information is
recorded in an import stage (DVC-file) in order to
recorded in an import stage (`.dvc` file) in order to
[reproduce](/doc/command-reference/repro) downloading of this same data version
in the future, where and when needed. This is achieved with the `repo` field,
for example (matching the import command above):
Expand Down Expand Up @@ -244,7 +245,7 @@ Importing ...
> Note that Git-tracked files can be imported from DVC repos as well.

The file is imported, and along with it, an import stage
([DVC-file](/doc/user-guide/dvc-file-format)) file is created. Check
([`.dvc` file](/doc/user-guide/dvc-file-format)) file is created. Check
`it-standards.csv.dvc`:

```yaml
Expand Down
10 changes: 5 additions & 5 deletions content/docs/command-reference/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ positional arguments:

## Description

DVC, by effectively replacing data files, models, directories with DVC-files
DVC, by effectively replacing data files, models, directories with `.dvc` files
(`.dvc`), hides actual locations and names. This means that you don't see data
files when you browse a <abbr>DVC repository</abbr> on Git hosting (e.g.
Github), you just see the DVC-files. This makes it hard to navigate the project
to find <abbr>data artifacts</abbr> for use with `dvc get`, `dvc import`, or
`dvc.api`.
Github), you just see the `.dvc` files. This makes it hard to navigate the
project to find <abbr>data artifacts</abbr> for use with `dvc get`,
`dvc import`, or `dvc.api`.

`dvc list` prints a virtual view of a DVC repository, as if files and
directories [tracked by DVC](/doc/use-cases/versioning-data-and-model-files)
Expand Down Expand Up @@ -97,7 +97,7 @@ project's page, you will see a similar list, except that `model.pkl` will be
missing. That's because its tracked by DVC and not visible to Git. You can find
it in the
[`train.dvc`](https://github.com/iterative/example-get-started/blob/master/train.dvc)
DVC-file (`outs` field).
`.dvc` file (`outs` field).

We can now, for example, download the model file with:

Expand Down
38 changes: 23 additions & 15 deletions content/docs/command-reference/pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Download tracked files or directories from
[remote storage](/doc/command-reference/remote) to the <abbr>cache</abbr> and
<abbr>workspace</abbr>, based on the current
[DVC-files](/doc/user-guide/dvc-file-format).
[`.dvc` files](/doc/user-guide/dvc-file-format).

## Synopsis

Expand All @@ -13,8 +13,8 @@ usage: dvc pull [-h] [-q | -v] [-j <number>]
[targets [targets ...]]

positional arguments:
targets Limit command scope to these DVC-files. Using -R,
directories to search DVC-files in can also be given.
targets Limit command scope to these `.dvc` files. Using -R,
directories to search `.dvc` files in can also be given.
```

## Description
Expand All @@ -37,17 +37,17 @@ remote.

With no arguments, just `dvc pull` or `dvc pull --remote <name>`, it downloads
only the files (or directories) missing from the workspace by searching all
[DVC-files](/doc/user-guide/dvc-file-format) currently in the
[`.dvc` files](/doc/user-guide/dvc-file-format) currently in the
<abbr>project</abbr>. It will not download files associated with earlier commits
in the <abbr>repository</abbr> (if using Git), nor will it download files that
have not changed.

The command `dvc status -c` can list files referenced in current DVC-files, but
missing in the <abbr>cache</abbr>. It can be used to see what files `dvc pull`
would download.
The command `dvc status -c` can list files referenced in current `.dvc` files,
but missing in the <abbr>cache</abbr>. It can be used to see what files
`dvc pull` would download.

If one or more `targets` are specified, DVC only considers the files associated
with those DVC-files. Using the `--with-deps` option, DVC tracks dependencies
with those `.dvc` files. Using the `--with-deps` option, DVC tracks dependencies
backward from the target [stage files](/doc/command-reference/run), through the
corresponding [pipelines](/doc/command-reference/pipeline), to find data files
to pull.
Expand All @@ -58,8 +58,8 @@ reflinks or hardlinks to put it in the workspace without copying. See

## Options

- `-a`, `--all-branches` - determines the files to download by examining
DVC-files in all Git branches instead of just those present in the current
- `-a`, `--all-branches` - determines the files to download by examining `.dvc`
files in all Git branches instead of just those present in the current
workspace. It's useful if branches are used to track experiments or project
checkpoints. Note that this can be combined with `-T` below, for example using
the `-aT` flag.
Expand All @@ -74,19 +74,19 @@ reflinks or hardlinks to put it in the workspace without copying. See
entire existing commit history of the project.

- `-d`, `--with-deps` - determines files to download by tracking dependencies to
the target DVC-files (stages). If no `targets` are provided, this option is
the target `.dvc` files (stages). If no `targets` are provided, this option is
ignored. By traversing all stage dependencies, DVC searches backward from the
target stages in the corresponding pipelines. This means DVC will not pull
files referenced in later stages than the `targets`.

- `-R`, `--recursive` - determines the files to pull by searching each target
directory and its subdirectories for DVC-files to inspect. If there are no
directory and its subdirectories for `.dvc` files to inspect. If there are no
directories among the `targets`, this option is ignored.

- `-f`, `--force` - does not prompt when removing workspace files, which occurs
when these file no longer match the current DVC-file references. This option
surfaces behavior from the `dvc fetch` and `dvc checkout` commands because
`dvc pull` in effect performs those 2 functions in a single command.
when these file no longer match the current `.dvc` file references. This
option surfaces behavior from the `dvc fetch` and `dvc checkout` commands
because `dvc pull` in effect performs those 2 functions in a single command.

- `-r <name>`, `--remote <name>` - name of the
[remote storage](/doc/command-reference/remote) to pull from (see
Expand Down Expand Up @@ -217,8 +217,16 @@ $ dvc remote list
r1 ssh://_username_@_host_/path/to/dvc/remote/storage
```

<<<<<<< HEAD With the first `dvc pull` we specified a stage in the middle of
this pipeline (`matrix-train.p.dvc`) while using `--with-deps`. DVC started with
that `.dvc` file and searched backwards through the pipeline for data files to
download. Because the `model.p.dvc` stage occurs later, its data was not pulled.
=======

> DVC supports several
> [remote types](/doc/command-reference/remote/add#supported-storage-types).
>
> > > > > > > c8b720a9017ca9db2caae7fc9f521f6192fc4f4c

To download DVC-tracked data from a specific DVC remote, use the `--remote`
(`-r`) option of `dvc pull`:
Expand Down
Loading