Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

guide: update old links to DVC Files and review related terms (prep for dvc.yaml 2.0) #2047

Merged
merged 2 commits into from
Dec 22, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions content/blog/2020-04-16-april-20-community-gems.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,5 +148,5 @@ look like hashes (because, well, they are). Luckily, DVC handles all the
conversions between the filenames in your local workspace and these hashes.

To get some more intuition about this, check out some of our
[docs](https://dvc.org/doc/user-guide/dvc-files-and-directories) about how DVC
organizes files.
[docs](https://dvc.org/doc/user-guide/dvc-internals) about how DVC organizes
files.
6 changes: 3 additions & 3 deletions content/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,9 @@ This is the main section with the general config options:
(default) and `false`.

- `core.autostage` - if enabled, DVC will automatically stage (`git add`)
[DVC metafiles](/doc/user-guide/dvc-files-and-directories) created or modified
by DVC commands (`dvc add`, `dvc run`, etc.). The files will not be committed.
Accepts values `true` and `false` (default).
[DVC files](/doc/user-guide/dvc-files) created or modified by DVC commands
(`dvc add`, `dvc run`, etc.). The files will not be committed. Accepts values
`true` and `false` (default).

### remote

Expand Down
11 changes: 5 additions & 6 deletions content/docs/command-reference/destroy.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# destroy

Remove all
[DVC files and directories](/doc/user-guide/dvc-files-and-directories) from a
<abbr>DVC project</abbr>.
Remove all [DVC files](/doc/user-guide/dvc-files) and
[internals](/doc/user-guide/dvc-internals) from a <abbr>DVC project</abbr>.

## Synopsis

Expand All @@ -22,9 +21,9 @@ Note that the <abbr>cache directory</abbr> will be removed as well, unless it's
cache, DVC will replace them with the latest versions of the actual files and
directories first, so that your data is intact after the project's destruction.

> Refer to
> [DVC files and directories](/doc/user-guide/dvc-files-and-directories) for
> more details on the directories and files deleted by this command.
> Refer to [DVC files](/doc/user-guide/dvc-files) and
> [internals](/doc/user-guide/dvc-internals) for more details on the directories
> and files deleted by this command.

## Options

Expand Down
6 changes: 3 additions & 3 deletions content/docs/command-reference/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ initializing DVC in the Git repo root:
- Repository maintainers might not allow a top level `.dvc/` directory,
especially if DVC is already being used by several sub-projects (monorepo).

- DVC [internals](/doc/user-guide/dvc-files-and-directories) (config file, cache
directory, etc.) would be shared across different subdirectories. This forces
all of them to use the same DVC settings and
- DVC [internals](/doc/user-guide/dvc-internals) (config file, cache directory,
etc.) would be shared across different subdirectories. This forces all of them
to use the same DVC settings and
[remote storage](/doc/command-reference/remote).

- By default, DVC commands like `dvc pull` and `dvc repro` explore the whole
Expand Down
22 changes: 10 additions & 12 deletions content/docs/command-reference/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ etc.) doesn't have DVC initialized (no `.dvc/` directory present).
Namely:

**Checkout**: For any commit hash, branch or tag, `git checkout` retrieves the
[DVC-files](/doc/user-guide/dvc-files-and-directories) corresponding to that
version. The project's DVC-files in turn refer to data stored in
<abbr>cache</abbr>, but not necessarily in the <abbr>workspace</abbr>. Normally,
it would be necessary to use `dvc checkout` to update the workspace accordingly.
[DVC files](/doc/user-guide/dvc-files) corresponding to that version. The
project's DVC-files in turn refer to data stored in <abbr>cache</abbr>, but not
necessarily in the <abbr>workspace</abbr>. Normally, it would be necessary to
use `dvc checkout` to update the workspace accordingly.

This hook automates `dvc checkout` after `git checkout`.

Expand Down Expand Up @@ -168,10 +168,9 @@ $ dvc pull --all-branches --all-tags
## Example: Checkout both Git and DVC

Switching from one Git commit to another (with `git checkout`) may change the
set of [DVC-files](/doc/user-guide/dvc-files-and-directories) in the
<abbr>workspace</abbr>. This would mean that the currently present data files
and directories no longer matches project's version (which can be fixed with
`dvc checkout`).
set of [DVC files](/doc/user-guide/dvc-files) in the <abbr>workspace</abbr>.
This would mean that the currently present data files and directories no longer
matches project's version (which can be fixed with `dvc checkout`).

Let's first list the available tags in the _Get Started_ repo:

Expand Down Expand Up @@ -220,10 +219,9 @@ We also see that the first `dvc status` tells us about differences between the
project's <abbr>cache</abbr> and the data files currently in the workspace. Git
changed the DVC-files in the workspace, which changed references to data files.
`dvc status` first informed us that the data files in the workspace no longer
matched the hash values in the corresponding `.dvc` and `dvc.lock`
[files](/doc/user-guide/dvc-files-and-directories). Running `dvc checkout` then
brings them up to date, and a second `dvc status` tells us that the data files
now do match the DVC files.
matched the hash values in the corresponding `.dvc` and `dvc.lock` files.
Running `dvc checkout` then brings them up to date, and a second `dvc status`
tells us that the data files now do match the DVC files.

```dvc
$ git checkout master
Expand Down
11 changes: 5 additions & 6 deletions content/docs/command-reference/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,11 @@ positional arguments:
## Description

A side-effect of DVC is that it hides actual data paths, by effectively
replacing files and directories with
[metafiles](/doc/user-guide/dvc-files-and-directories). So you don't see data
files/dirs when you browse a <abbr>DVC repository</abbr> on Git hosting (e.g.
GitHub), you just see the `dvc.yaml` and `.dvc` files. This can make it hard to
navigate the project, for example to find files or directories for use with
`dvc get`, `dvc import`, or `dvc.api` functions.
replacing files and directories with [DVC files](/doc/user-guide/dvc-files). So
you don't see data files/dirs when you browse a <abbr>DVC repository</abbr> on
Git hosting (e.g. GitHub), you just see the `dvc.yaml` and `.dvc` files. This
can make it hard to navigate the project, for example to find files or
directories for use with `dvc get`, `dvc import`, or `dvc.api` functions.

This command produces a view of a DVC repository, as if files and directories
tracked by DVC were found directly in the Git repo. Its output is equivalent to
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/metrics/show.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ positional arguments:
## Description

Finds and prints all metrics in the <abbr>project</abbr> by examining all of its
[DVC-files](/doc/user-guide/dvc-files-and-directories).
[DVC files](/doc/user-guide/dvc-files).

If `targets` are provided, it will show those specific metrics files instead.
With the `-a` or `-T` options, this command shows the different metrics values
Expand Down
3 changes: 1 addition & 2 deletions content/docs/command-reference/remote/modify.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,7 @@ The following config options are available for all remote types:
DVC will recalculate the file hashes upon download (e.g. `dvc pull`) to make
sure that these haven't been modified, or corrupted during download. It may
slow down the aforementioned commands. The calculated hash is compared to the
value saved in the corresponding
[DVC-file](/doc/user-guide/dvc-files-and-directories).
value saved in the corresponding [DVC file](/doc/user-guide/dvc-files).

> Note that this option is enabled on **Google Drive** remotes by default.

Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/status.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ It accepts paths to tracked files or directories (including paths inside tracked
directories), `.dvc` files, and stage names (found in `dvc.yaml`).

The `--all-branches`, `--all-tags`, and `--all-commits` options enable comparing
[metafiles](/doc/user-guide/dvc-files-and-directories) referenced in multiple
Git commits at once.
[DVC files](/doc/user-guide/dvc-files) referenced in multiple Git commits at
once.

If no differences are detected, `dvc status` prints
`Data and pipelines are up to date.` or
Expand Down
4 changes: 2 additions & 2 deletions content/docs/install/plugins.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# IDE Plugins and Syntax Highlighting

When files or directories are added to the project, or stages to a pipeline,
[DVC metafiles](/doc/user-guide/dvc-files-and-directories) are created. These
use a simple YAML format.
[DVC files](/doc/user-guide/dvc-files) are created. These use a simple YAML
format.

We maintain a [schema](https://github.com/iterative/dvcyaml-schema) for
`dvc.yaml` that can enable IDE syntax checks and auto-completion.
Expand Down
6 changes: 3 additions & 3 deletions content/docs/start/data-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ $ dvc add data/data.xml

DVC stores information about the added file (or a directory) in a special `.dvc`
file named `data/data.xml.dvc`, a small text file with a human-readable
[format](/doc/user-guide/dvc-files-and-directories#dvc-files). This file can be
easily versioned like source code with Git, as a placeholder for the original
data (which gets listed in `.gitignore`):
[format](/doc/user-guide/dvc-files#dvc-files). This file can be easily versioned
like source code with Git, as a placeholder for the original data (which gets
listed in `.gitignore`):

```dvc
$ git add data/data.xml.dvc data/.gitignore
Expand Down
2 changes: 1 addition & 1 deletion content/docs/use-cases/data-registries.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ $ git commit -am "Add 1,000 more songs to music/ dataset."

Iterating on this process for several datasets can give shape to a robust
registry. The result is basically a repo that versions a set of
[metafiles](/doc/user-guide/dvc-files-and-directories). Let's see an example:
[metafiles](/doc/user-guide/dvc-files). Let's see an example:

```dvc
$ tree --filelimit=10
Expand Down
4 changes: 2 additions & 2 deletions content/docs/use-cases/sharing-data-and-model-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,8 @@ with the `dvc push` command:
$ dvc push
```

Code and [DVC-files](/doc/user-guide/dvc-files-and-directories) can be safely
committed and pushed with Git.
Code and [DVC files](/doc/user-guide/dvc-files) can be safely committed and
pushed with Git.

## Download code

Expand Down
11 changes: 5 additions & 6 deletions content/docs/use-cases/versioning-data-and-model-files/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,11 @@ work!
![](/img/project-versions.png) _DVC matches the right versions of data, code,
and models for you 💘._

DVC enables data _versioning through codification_. You write simple
[metafiles](/doc/user-guide/dvc-files-and-directories) once, describing what
datasets, ML artifacts, etc. to track. This metadata can be put in Git in lieu
of large files. Now you can use DVC to create
[snapshots](/doc/command-reference/add) of the data,
[restore](/doc/command-reference/checkout) previous versions,
DVC enables data _versioning through codification_. You produce simple
[metafiles](/doc/user-guide/dvc-files) once, describing what datasets, ML
artifacts, etc. to track. This metadata can be put in Git in lieu of large
files. Now you can use DVC to create [snapshots](/doc/command-reference/add) of
the data, [restore](/doc/command-reference/checkout) previous versions,
[reproduce](/doc/command-reference/repro) experiments, record evolving
[metrics](/doc/command-reference/metrics), and more!

Expand Down
22 changes: 10 additions & 12 deletions content/docs/use-cases/versioning-data-and-model-files/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,8 @@ $ pip install -r requirements.txt
The repository you cloned is already DVC-initialized. It already contains a
`.dvc/` directory with the `config` and `.gitignore` files. These and other
files and directories are hidden from user, as typically there's no need to
interact with them directly. See
[DVC Files and Directories](/doc/user-guide/dvc-files-and-directories) to learn
more.
interact with them directly. See [DVC Internals](/doc/user-guide/dvc-internals)
to learn more.

</details>

Expand Down Expand Up @@ -137,9 +136,8 @@ intermediate results, etc. It tells Git to ignore the directory and puts it into
the <abbr>cache</abbr> (while keeping a
[file link](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
to it in the <abbr>workspace</abbr>, so you can continue working the same way as
before). This is achieved by creating a simple human-readable
[DVC-file](/doc/user-guide/dvc-files-and-directories) that serves as a pointer
to the cache.
before). This is achieved by creating a simple human-readable `.dvc` file that
serves as a pointer to the cache.

Next, we train our first model with `train.py`. Because of the small dataset,
this training process should be small enough to run on most computers in a
Expand Down Expand Up @@ -174,8 +172,8 @@ As we mentioned briefly, DVC does not commit the `data/` directory and
then `git commit` DVC-files that contain file hashes that point to cached data.

In this case we created `data.dvc` and `model.h5.dvc`. Refer to
[DVC Files](/doc/user-guide/dvc-files-and-directories) to learn more about how
these files work.
[DVC Files](/doc/user-guide/dvc-files#dvc-files) to learn more about how these
files work.

</details>

Expand Down Expand Up @@ -284,14 +282,14 @@ the `v2.0` tag.

<details>

### Expand to learn more about DVC internals
### Expand to learn more about DVC files

As we have learned already, DVC keeps data files out of Git (by adjusting
`.gitignore`) and puts them into the <abbr>cache</abbr> (usually it's a
`.dvc/cache` directory inside the repository). Instead, DVC creates
[DVC-files](/doc/user-guide/dvc-files-and-directories). These text files serve
as data placeholders that point to the cached files, and they can be easily
version controlled with Git.
[DVC files](/doc/user-guide/dvc-files). These text files serve as data
placeholders that point to the cached files, and they can be easily version
controlled with Git.

When we run `git checkout` we restore pointers (DVC-files) first. Then, when we
run `dvc checkout`, we use these pointers to put the right data in the right
Expand Down
5 changes: 2 additions & 3 deletions content/docs/user-guide/basic-concepts/dvc-project.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,5 @@ match:

Initialized by running `dvc init` in the **workspace** (typically a Git
repository). It will contain the
[`.dvc/` directory](/doc/user-guide/dvc-files-and-directories), as well as
`dvc.yaml` and `.dvc` files created with commands such as `dvc add` or
`dvc run`.
[`.dvc/` directory](/doc/user-guide/dvc-internals), as well as `dvc.yaml` and
`.dvc` files created with commands such as `dvc add` or `dvc run`.
4 changes: 2 additions & 2 deletions content/docs/user-guide/contributing/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,8 @@ is installed when `yarn` runs (see [dev env](#development-environment)).
`dvc`, `yaml`, or `diff` custom languages. `usage` is employed to show the
`dvc --help` output for each command reference. `dvc` can be used to show
examples of commands and their output in a terminal session. `yaml` is used to
show [DVC-file](/doc/user-guide/dvc-files-and-directories) contents or other
YAML data. `diff` is used mainly for examples of `git diff` output.
show [DVC file](/doc/user-guide/dvc-files) contents, or other YAML data.
`diff` is used mainly for examples of `git diff` output.

> Check out the `.md` source code of any command reference to get a better idea,
> for example in
Expand Down
5 changes: 2 additions & 3 deletions content/docs/user-guide/dvc-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,8 +218,7 @@ It's created or updated by DVC commands such as `dvc run` and `dvc repro`.
- `dvc.lock` is needed internally for several DVC commands to operate, such as
`dvc checkout`, `dvc get`, and `dvc import`.

Here's an example `dvc.lock` based on the one in [`dvc.yaml`](#dvcyaml-file)
above:
Here's an example `dvc.lock` based on the one in `dvc.yaml` above:

```yaml
stages:
Expand Down Expand Up @@ -248,7 +247,7 @@ Regular <abbr>dependencies</abbr> and all kinds of <abbr>outputs</abbr>
[plots](/doc/command-reference/plots) files) are also listed (per stage) in
`dvc.lock`, but with an additional field to store the hash value of each file or
directory tracked by DVC. Specifically: `md5`, `etag`, or `checksum` (same as in
`deps` and `outs` entries of [`.dvc` files](#dvc-files)).
`deps` and `outs` entries of `.dvc` files).

Full <abbr>parameters</abbr> (key and value) are listed separately under
`params`, grouped by parameters file.
10 changes: 5 additions & 5 deletions content/docs/user-guide/how-to/merge-conflicts.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
---
title: 'How to Merge Conflicts'
description: 'Git merge conflicts can happen in DVC metafiles when combining
changes from multiple team members.'
description: 'Git merge conflicts can happen in DVC files when combining changes
from multiple team members.'
---

# How to Merge Conflicts in DVC Metafiles
# How to Merge Conflicts in DVC Files

Sometimes multiple members of a team might work on the the same DVC-tracked
data. And when the time comes to combine their changes, merge conflicts can
happen in Git-tracked [metafiles](/doc/user-guide/dvc-files-and-directories),
which need to be resolved.
happen in Git-tracked [DVC files](/doc/user-guide/dvc-files), which need to be
resolved.

## `dvc.yaml`

Expand Down
3 changes: 1 addition & 2 deletions content/docs/user-guide/large-dataset-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@ In order to track the data files and directories added with `dvc add` or
`dvc run`, DVC moves all these files to the <abbr>cache</abbr>. A
<abbr>project</abbr>'s cache is the hidden storage (by default located in
`.dvc/cache`) for files that are tracked by DVC, and their different versions.
(See `dvc cache` and
[DVC Files and Directories](/doc/user-guide/dvc-files-and-directories) for more
(See `dvc cache` and [DVC Internals](/doc/user-guide/dvc-internals) for more
details.)

However, the versions of the tracked files that
Expand Down
4 changes: 2 additions & 2 deletions content/docs/user-guide/related-technologies.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,8 @@ _Luigi_, etc.
visualizations.

- DVC has transparent design. Its
[internal files and directories](/doc/user-guide/dvc-files-and-directories)
have a human-readable format and can be easily reused by external tools.
[internal directories and files](/doc/user-guide/dvc-internals) have a
human-readable format and can be easily reused by external tools.

## Build automation tools

Expand Down
12 changes: 6 additions & 6 deletions content/docs/user-guide/what-is-dvc.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ can version experiments, manage large datasets, and make projects reproducible.

- **Data versioning** is enabled by replacing large files, dataset directories,
machine learning models, etc. with small
[metafiles](/doc/user-guide/dvc-files-and-directories) (easy to handle with
Git). These placeholders point to the original data, which is decoupled from
source code management.
[metafiles](/doc/user-guide/dvc-files) (easy to handle with Git). These
placeholders point to the original data, which is decoupled from source code
management.

- **Data storage**: On-premises or cloud storage can be used to store the
project's data separate from its code base. This is how data scientists can
Expand All @@ -50,10 +50,10 @@ can version experiments, manage large datasets, and make projects reproducible.

## DVC does not replace Git!

DVC metafiles such as `dvc.yaml` and `.dvc` files serve as placeholders to track
DVC file such as `dvc.yaml` and `.dvc` files serve as placeholders to track
large data files and directories for versioning (among other
[purposes](/doc/user-guide/dvc-files-and-directories)). These metafiles change
along with your data, and you can use Git to place them under
[purposes](/doc/user-guide/dvc-files)). These metafiles change along with your
data, and you can use Git to place them under
[version control](https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control)
as a proxy to the actual data versions, which are stored in the <abbr>DVC
cache</abbr> (outside of Git). This does not replace features of Git.
Expand Down