Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVC file -> .dvc file (2nd chunk) reopen #1408

Merged
merged 40 commits into from
Jun 12, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
767372c
term: DVC-file -> .dvc file from Utkarsh work (2nd chunk)
jorgeorpinel Jun 8, 2020
b752166
Merge branch '1.0/ter/dvcfile/basics-2' of https://github.com/iterati…
utkarshsingh99 Jun 8, 2020
adb4377
2nd chunk DVC-file -> .dvc file
utkarshsingh99 Jun 8, 2020
1c02bc6
Added links to first occurences of .dvc files in /basic-concepts/
utkarshsingh99 Jun 8, 2020
178d8b7
Formatting changes
utkarshsingh99 Jun 8, 2020
4de62ad
Update content/docs/user-guide/basic-concepts/dvc-project.md
jorgeorpinel Jun 8, 2020
63b57e1
Update content/docs/user-guide/basic-concepts/external-dependency.md
jorgeorpinel Jun 8, 2020
c8dd303
Merge branch 'dvc-file-2' of https://github.com/utkarshsingh99/dvc.or…
utkarshsingh99 Jun 8, 2020
1cbc5f3
Review changes - I
utkarshsingh99 Jun 8, 2020
df4eafa
Update content/docs/command-reference/pull.md
utkarshsingh99 Jun 8, 2020
2831f68
Review changes - II
utkarshsingh99 Jun 8, 2020
09159ab
Update content/docs/command-reference/pull.md
jorgeorpinel Jun 9, 2020
481674b
Update content/docs/command-reference/pull.md
jorgeorpinel Jun 9, 2020
1ce452b
Update content/docs/command-reference/pull.md
jorgeorpinel Jun 9, 2020
ac13327
Update content/docs/command-reference/pull.md
jorgeorpinel Jun 9, 2020
fe5c4b4
Update content/docs/command-reference/remove.md
jorgeorpinel Jun 9, 2020
082bce7
update content/docs/command-reference/pull.md push.md remove.md
utkarshsingh99 Jun 9, 2020
11b5369
update content/docs/command-reference/fetch.md status.md
utkarshsingh99 Jun 9, 2020
f92ba3f
Update content/docs/command-reference/pull.md
jorgeorpinel Jun 10, 2020
df3bc8c
Update content/docs/command-reference/pull.md
jorgeorpinel Jun 10, 2020
e162d4a
Update content/docs/command-reference/push.md
jorgeorpinel Jun 10, 2020
ea3d312
Update content/docs/command-reference/fetch.md
jorgeorpinel Jun 10, 2020
80ef90b
Update content/docs/command-reference/pull.md
jorgeorpinel Jun 10, 2020
bff23d6
Update content/docs/command-reference/push.md
jorgeorpinel Jun 10, 2020
16115a7
Update content/docs/command-reference/remove.md
jorgeorpinel Jun 10, 2020
9ff51d1
Update content/docs/command-reference/status.md
jorgeorpinel Jun 10, 2020
13c645d
Update content/docs/command-reference/update.md
jorgeorpinel Jun 10, 2020
cae7e3c
formatting content/docs/command-reference/pull.md
utkarshsingh99 Jun 10, 2020
fd335ee
update content/docs/command-reference/status push fetch
utkarshsingh99 Jun 10, 2020
b2aa66d
Update content/docs/command-reference/status.md
jorgeorpinel Jun 11, 2020
a043f37
update "content/docs/command-reference/fetch.md"
utkarshsingh99 Jun 11, 2020
64a731c
update content/docs/command-reference/import.md
utkarshsingh99 Jun 11, 2020
cccb75e
update content/docs/command-reference/import-url.md
utkarshsingh99 Jun 11, 2020
981e75b
update content/docs/command-reference/list.md
utkarshsingh99 Jun 11, 2020
753f4d2
Update content/docs/command-reference/status.md
jorgeorpinel Jun 11, 2020
54c15cf
update content/docs/command-reference/status.md
utkarshsingh99 Jun 12, 2020
d5e5b25
Conflicts fixed
utkarshsingh99 Jun 12, 2020
56179c4
Update content/docs/command-reference/status.md
jorgeorpinel Jun 12, 2020
fedd930
Update content/docs/command-reference/status.md
jorgeorpinel Jun 12, 2020
e4a491b
Update content/docs/command-reference/status.md
jorgeorpinel Jun 12, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions content/docs/command-reference/import-url.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Download a file or directory from a supported URL (for example `s3://`,
`ssh://`, and other protocols) into the <abbr>workspace</abbr>, and track
changes in the remote data source. Creates a DVC-file.
changes in the remote data source. Creates a `.dvc` file.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

> See `dvc import` to download and tack data/model files or directories from
> other <abbr>DVC repositories</abbr> (e.g. hosted on Github).
Expand Down Expand Up @@ -41,11 +41,11 @@ while `out` can be used to specify the directory and/or file name desired for
the downloaded data. If an existing directory is specified, the file or
directory will be placed inside.

[DVC-files](/doc/user-guide/dvc-file-format) support references to data in an
[`.dvc` files](/doc/user-guide/dvc-file-format) support references to data in an
external location, see
[External Dependencies](/doc/user-guide/external-dependencies). In such a
DVC-file, the `deps` field stores the remote URL, and the `outs` field contains
the corresponding local path in the <abbr>workspace</abbr>. It records enough
[External Dependencies](/doc/user-guide/external-dependencies). In such a `.dvc`
file, the `deps` field stores the remote URL, and the `outs` field contains the
corresponding local path in the <abbr>workspace</abbr>. It records enough
metadata about the imported data to enable DVC efficiently determining whether
the local copy is out of date.

Expand Down Expand Up @@ -102,7 +102,7 @@ $ dvc run -d https://example.com/path/to/data.csv \
wget https://example.com/path/to/data.csv -O data.csv
```

Both methods generate a [DVC-files](/doc/user-guide/dvc-file-format) with an
Both methods generate a [`.dvc` files](/doc/user-guide/dvc-file-format) with an
external dependency, but the one created by `dvc import-url` preserves the
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
connection to the data source. We call this an _import stage_.

Expand All @@ -113,9 +113,9 @@ up to date from the external data source.
## Options

- `-f <filename>`, `--file <filename>` - specify a path and/or file name for the
DVC-file created by this command (e.g. `-f stages/stage.dvc`). This overrides
the default file name: `<file>.dvc`, where `<file>` is the desired file name
of the imported data (`out`).
`.dvc` file created by this command (e.g. `-f stages/stage.dvc`). This
overrides the default file name: `<file>.dvc`, where `<file>` is the desired
file name of the imported data (`out`).

- `-h`, `--help` - prints the usage/help message, and exit.

Expand Down Expand Up @@ -167,7 +167,7 @@ To track the changes with git, run:
git add data.xml.dvc data/.gitignore
```

Let's take a look at the resulting stage file (DVC-file) `data.xml.dvc`:
Let's take a look at the resulting stage file (`.dvc` file) `data.xml.dvc`:

```yaml
md5: 61e80c38c1ce04ed2e11e331258e6d0d
Expand All @@ -183,7 +183,7 @@ outs:
persist: false
```

The `etag` field in the DVC-file contains the
The `etag` field in the `.dvc` file contains the
[ETag](https://en.wikipedia.org/wiki/HTTP_ETag) recorded from the HTTP request.
If the remote file changes, its ETag will be different. This metadata allows DVC
to determine whether its necessary to download it again.
Expand Down Expand Up @@ -241,7 +241,7 @@ outs:
persist: false
```

The DVC-file is nearly the same as in the previous example. The difference is
The `.dvc` file is nearly the same as in the previous example. The difference is
that the dependency (`deps`) now references the local file in the data store
directory we created previously. (Its `path` has the URL for the data store.)
And instead of an `etag` we have an `md5` hash value. We did this so its easy to
Expand Down Expand Up @@ -309,8 +309,8 @@ Data and pipelines are up to date.

In the data store directory, edit `data.xml`. It doesn't matter what you change,
as long as it remains a valid XML file, because any change will result in a
different dependency file hash (`md5`) in the import stage DVC-file. Once we do
so, we can run `dvc update` to make sure the import stage is up to date:
different dependency file hash (`md5`) in the import stage `.dvc` file. Once we
do so, we can run `dvc update` to make sure the import stage is up to date:

```dvc
$ dvc update data.xml.dvc
Expand Down
35 changes: 18 additions & 17 deletions content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Download a file or directory tracked by DVC or by Git into the
<abbr>workspace</abbr>. It also creates a
[DVC-file](/doc/user-guide/dvc-file-format) with information about the data
[`.dvc` file](/doc/user-guide/dvc-file-format) with information about the data
source, which can later be used to [update](/doc/command-reference/update) the
import.

Expand Down Expand Up @@ -44,7 +44,7 @@ The `path` argument is used to specify the location of the target to be
downloaded within the source repository at `url`. `path` can specify any file or
directory in the source repo, including those tracked by DVC, or by Git. Note
that DVC-tracked targets should be found in a
[DVC-file](/doc/user-guide/dvc-file-format) of the project.
[`.dvc` file](/doc/user-guide/dvc-file-format) of the project.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

⚠️ The project should have a default
[DVC remote](/doc/command-reference/remote), containing the actual data for this
Expand All @@ -55,15 +55,16 @@ command to work.

After running this command successfully, the imported data is placed in the
current working directory (unless `-o` is used) with its original file name e.g.
`data.txt`. An _import stage_ (DVC-file) is also created in the same location,
extending the name of the imported data e.g. `data.txt.dvc` – similar to having
used `dvc run` to generate the data as a stage <abbr>output</abbr>.
`data.txt`. An _import stage_ (`.dvc` file) is also created in the same
location, extending the name of the imported data e.g. `data.txt.dvc` – similar
to having used `dvc run` to generate the data as a stage <abbr>output</abbr>.

DVC-files support references to data in an external DVC repository (hosted on a
Git server). In such a DVC-file, the `deps` field specifies the remote `url` and
data `path`, and the `outs` field contains the corresponding local path in the
<abbr>workspace</abbr>. It records enough metadata about the imported data to
enable DVC efficiently determining whether the local copy is out of date.
`.dvc` files support references to data in an external DVC repository (hosted on
a Git server). In such a `.dvc` file, the `deps` field specifies the remote
`url` and data `path`, and the `outs` field contains the corresponding local
path in the <abbr>workspace</abbr>. It records enough metadata about the
imported data to enable DVC efficiently determining whether the local copy is
out of date.

To actually
[track the data](https://dvc.org/doc/tutorials/get-started/data-versioning),
Expand Down Expand Up @@ -112,8 +113,8 @@ Importing 'data/data.xml ([email protected]:iterative/example-get-started)'

In contrast with `dvc get`, this command doesn't just download the data file,
but it also creates an import stage
([DVC-file](/doc/user-guide/dvc-file-format)) with a link to the data source (as
explained in the description above). (This import stage can later be used to
([`.dvc` file](/doc/user-guide/dvc-file-format)) with a link to the data source
(as explained in the description above). (This import stage can later be used to
[update](/doc/command-reference/update) the import.) Check `data.xml.dvc`:

```yaml
Expand Down Expand Up @@ -152,7 +153,7 @@ Importing
```

When using this option, the import stage
([DVC-file](/doc/user-guide/dvc-file-format)) will also have a `rev` subfield
([`.dvc` file](/doc/user-guide/dvc-file-format)) will also have a `rev` subfield
under `repo`:

```yaml
Expand All @@ -166,7 +167,7 @@ deps:

If `rev` is a Git branch or tag (where the underlying commit changes), the data
source may have updates at a later time. To bring it up to date if so (and
update `rev_lock` in the DVC-file), simply use `dvc update <stage>.dvc`. If
update `rev_lock` in the `.dvc` file), simply use `dvc update <stage>.dvc`. If
`rev` is a specific commit hash (does not change), `dvc update` without options
will not have an effect on the import stage. You may force-update it to a
different commit with `dvc update --rev`:
Expand All @@ -184,7 +185,7 @@ If you take a look at our
[dataset registry](https://github.com/iterative/dataset-registry)
<abbr>project</abbr>, you'll see that it's organized into different directories
such as `tutorial/ver` and `use-cases/`, and these contain
[DVC-files](/doc/user-guide/dvc-file-format) that track different datasets.
[`.dvc` files](/doc/user-guide/dvc-file-format) that track different datasets.
Given this simple structure, its data files can be easily shared among several
other projects using `dvc get` and `dvc import`. For example:

Expand All @@ -205,7 +206,7 @@ $ dvc import [email protected]:iterative/dataset-registry.git \
`dvc import` provides a better way to incorporate data files tracked in external
<abbr>DVC repositories</abbr> because it saves the connection between the
current project and the source repo. This means that enough information is
recorded in an import stage (DVC-file) in order to
recorded in an import stage (`.dvc` file) in order to
[reproduce](/doc/command-reference/repro) downloading of this same data version
in the future, where and when needed. This is achieved with the `repo` field,
for example (matching the import command above):
Expand Down Expand Up @@ -244,7 +245,7 @@ Importing ...
> Note that Git-tracked files can be imported from DVC repos as well.

The file is imported, and along with it, an import stage
([DVC-file](/doc/user-guide/dvc-file-format)) file is created. Check
([`.dvc` file](/doc/user-guide/dvc-file-format)) file is created. Check
`it-standards.csv.dvc`:

```yaml
Expand Down
10 changes: 5 additions & 5 deletions content/docs/command-reference/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ positional arguments:

## Description

DVC, by effectively replacing data files, models, directories with DVC-files
DVC, by effectively replacing data files, models, directories with `.dvc` files
(`.dvc`), hides actual locations and names. This means that you don't see data
files when you browse a <abbr>DVC repository</abbr> on Git hosting (e.g.
Github), you just see the DVC-files. This makes it hard to navigate the project
to find <abbr>data artifacts</abbr> for use with `dvc get`, `dvc import`, or
`dvc.api`.
Github), you just see the `.dvc` files. This makes it hard to navigate the
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
project to find <abbr>data artifacts</abbr> for use with `dvc get`,
`dvc import`, or `dvc.api`.

`dvc list` prints a virtual view of a DVC repository, as if files and
directories [tracked by DVC](/doc/use-cases/versioning-data-and-model-files)
Expand Down Expand Up @@ -97,7 +97,7 @@ project's page, you will see a similar list, except that `model.pkl` will be
missing. That's because its tracked by DVC and not visible to Git. You can find
it in the
[`train.dvc`](https://github.com/iterative/example-get-started/blob/master/train.dvc)
DVC-file (`outs` field).
`.dvc` file (`outs` field).

We can now, for example, download the model file with:

Expand Down
38 changes: 23 additions & 15 deletions content/docs/command-reference/pull.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Download tracked files or directories from
[remote storage](/doc/command-reference/remote) to the <abbr>cache</abbr> and
<abbr>workspace</abbr>, based on the current
[DVC-files](/doc/user-guide/dvc-file-format).
[`.dvc` files](/doc/user-guide/dvc-file-format).
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

## Synopsis

Expand All @@ -13,8 +13,8 @@ usage: dvc pull [-h] [-q | -v] [-j <number>]
[targets [targets ...]]

positional arguments:
targets Limit command scope to these DVC-files. Using -R,
directories to search DVC-files in can also be given.
targets Limit command scope to these `.dvc` files. Using -R,
directories to search `.dvc` files in can also be given.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
```

## Description
Expand All @@ -37,17 +37,17 @@ remote.

With no arguments, just `dvc pull` or `dvc pull --remote <name>`, it downloads
only the files (or directories) missing from the workspace by searching all
[DVC-files](/doc/user-guide/dvc-file-format) currently in the
[`.dvc` files](/doc/user-guide/dvc-file-format) currently in the
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
<abbr>project</abbr>. It will not download files associated with earlier commits
in the <abbr>repository</abbr> (if using Git), nor will it download files that
have not changed.

The command `dvc status -c` can list files referenced in current DVC-files, but
missing in the <abbr>cache</abbr>. It can be used to see what files `dvc pull`
would download.
The command `dvc status -c` can list files referenced in current `.dvc` files,
but missing in the <abbr>cache</abbr>. It can be used to see what files
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
`dvc pull` would download.

If one or more `targets` are specified, DVC only considers the files associated
with those DVC-files. Using the `--with-deps` option, DVC tracks dependencies
with those `.dvc` files. Using the `--with-deps` option, DVC tracks dependencies
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
backward from the target [stage files](/doc/command-reference/run), through the
corresponding [pipelines](/doc/command-reference/pipeline), to find data files
to pull.
Expand All @@ -58,8 +58,8 @@ reflinks or hardlinks to put it in the workspace without copying. See

## Options

- `-a`, `--all-branches` - determines the files to download by examining
DVC-files in all Git branches instead of just those present in the current
- `-a`, `--all-branches` - determines the files to download by examining `.dvc`
files in all Git branches instead of just those present in the current
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
workspace. It's useful if branches are used to track experiments or project
checkpoints. Note that this can be combined with `-T` below, for example using
the `-aT` flag.
Expand All @@ -74,19 +74,19 @@ reflinks or hardlinks to put it in the workspace without copying. See
entire existing commit history of the project.

- `-d`, `--with-deps` - determines files to download by tracking dependencies to
the target DVC-files (stages). If no `targets` are provided, this option is
the target `.dvc` files (stages). If no `targets` are provided, this option is
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
ignored. By traversing all stage dependencies, DVC searches backward from the
target stages in the corresponding pipelines. This means DVC will not pull
files referenced in later stages than the `targets`.

- `-R`, `--recursive` - determines the files to pull by searching each target
directory and its subdirectories for DVC-files to inspect. If there are no
directory and its subdirectories for `.dvc` files to inspect. If there are no
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
directories among the `targets`, this option is ignored.

- `-f`, `--force` - does not prompt when removing workspace files, which occurs
when these file no longer match the current DVC-file references. This option
surfaces behavior from the `dvc fetch` and `dvc checkout` commands because
`dvc pull` in effect performs those 2 functions in a single command.
when these file no longer match the current `.dvc` file references. This
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
option surfaces behavior from the `dvc fetch` and `dvc checkout` commands
because `dvc pull` in effect performs those 2 functions in a single command.

- `-r <name>`, `--remote <name>` - name of the
[remote storage](/doc/command-reference/remote) to pull from (see
Expand Down Expand Up @@ -217,8 +217,16 @@ $ dvc remote list
r1 ssh://_username_@_host_/path/to/dvc/remote/storage
```

<<<<<<< HEAD With the first `dvc pull` we specified a stage in the middle of
this pipeline (`matrix-train.p.dvc`) while using `--with-deps`. DVC started with
that `.dvc` file and searched backwards through the pipeline for data files to
download. Because the `model.p.dvc` stage occurs later, its data was not pulled.
=======

> DVC supports several
> [remote types](/doc/command-reference/remote/add#supported-storage-types).
>
> > > > > > > c8b720a9017ca9db2caae7fc9f521f6192fc4f4c
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

To download DVC-tracked data from a specific DVC remote, use the `--remote`
(`-r`) option of `dvc pull`:
Expand Down
Loading