Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

term: remove "stage files" (except from old blogs) #2048

Merged
merged 4 commits into from
Dec 22, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions content/blog/2020-04-16-april-20-community-gems.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,12 +129,12 @@ modify its corresponding DVC file. It's handy so you don't rename a file in your
local workspace that's under DVC tracking without updating DVC to the change
(see an [example here](https://dvc.org/doc/command-reference/move#description)).
The function doesn't work on
[stage files](https://dvc.org/doc/tutorials/pipelines#define-stages) from DVC
pipelines. There's not currently an easy way to safely move stage files, and
it's an
["stage files"](https://dvc.org/doc/tutorials/pipelines#define-stages) from DVC
pipelines. There's not currently an easy way to safely move `dvc.yaml` files,
and it's an
[open issue we're working on](https://github.com/iterative/dvc/issues/1489).
Until then, you can manually update the stage file, or make a new one in the
desired location.
Until then, you can manually update `dvc.yaml`, or make a new one in the desired
location.

### Q: [I just starting using DVC and noticed that when I `dvc push` files to remote cloud storage, the directory in my remote looks like my DVC cache, not my local workspace directory. Is this right?](https://discordapp.com/channels/485586884165107732/485596304961962003/693740598498426930)

Expand All @@ -148,5 +148,5 @@ look like hashes (because, well, they are). Luckily, DVC handles all the
conversions between the filenames in your local workspace and these hashes.

To get some more intuition about this, check out some of our
[docs](https://dvc.org/doc/user-guide/dvc-files-and-directories) about how DVC
organizes files.
[docs](https://dvc.org/doc/user-guide/dvc-internals) about how DVC organizes
files.
2 changes: 1 addition & 1 deletion content/blog/2020-06-22-dvc-1-0-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ pipelines with data processing steps. People need to change the commands of the
pipeline often and it was not easy to do this with the old DVC-files.

In DVC 1.0, the DVC metafile format was changed in three big ways. First,
instead of multiple DVC stage files (`*.dvc`), each project has a single
instead of multiple DVC "stage files" (`*.dvc`), each project has a single
`dvc.yaml` file. By default, all stages go in this single YAML file.

Second, we made clear connections between the `dvc run` command (a helper to
Expand Down
2 changes: 1 addition & 1 deletion content/blog/2020-07-22-july-20-community-gems.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ If for some reason this won't work for your team, you can either downgrade to a
previous version, or use a workaround:

```dvc
$ dvc repro <.dvc stage file>
$ dvc repro <.dvc file>
```

substituting the appropriate `.dvc` file for your pipeline. DVC 1.0 is backwards
Expand Down
6 changes: 3 additions & 3 deletions content/docs/command-reference/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,9 @@ This is the main section with the general config options:
(default) and `false`.

- `core.autostage` - if enabled, DVC will automatically stage (`git add`)
[DVC metafiles](/doc/user-guide/dvc-files-and-directories) created or modified
by DVC commands (`dvc add`, `dvc run`, etc.). The files will not be committed.
Accepts values `true` and `false` (default).
[DVC files](/doc/user-guide/dvc-files) created or modified by DVC commands
(`dvc add`, `dvc run`, etc.). The files will not be committed. Accepts values
`true` and `false` (default).

### remote

Expand Down
11 changes: 5 additions & 6 deletions content/docs/command-reference/destroy.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
# destroy

Remove all
[DVC files and directories](/doc/user-guide/dvc-files-and-directories) from a
<abbr>DVC project</abbr>.
Remove all [DVC files](/doc/user-guide/dvc-files) and
[internals](/doc/user-guide/dvc-internals) from a <abbr>DVC project</abbr>.

## Synopsis

Expand All @@ -22,9 +21,9 @@ Note that the <abbr>cache directory</abbr> will be removed as well, unless it's
cache, DVC will replace them with the latest versions of the actual files and
directories first, so that your data is intact after the project's destruction.

> Refer to
> [DVC files and directories](/doc/user-guide/dvc-files-and-directories) for
> more details on the directories and files deleted by this command.
> Refer to [DVC files](/doc/user-guide/dvc-files) and
> [internals](/doc/user-guide/dvc-internals) for more details on the directories
> and files deleted by this command.

## Options

Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ outs:
cache: true
```

Several of the values above are pulled from the original stage file
Several of the values above are pulled from the original `.dvc` file
`model.pkl.dvc` in the external DVC repository. The `url` and `rev_lock`
subfields under `repo` are used to save the origin and version of the
dependency, respectively.
Expand Down
6 changes: 3 additions & 3 deletions content/docs/command-reference/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ initializing DVC in the Git repo root:
- Repository maintainers might not allow a top level `.dvc/` directory,
especially if DVC is already being used by several sub-projects (monorepo).

- DVC [internals](/doc/user-guide/dvc-files-and-directories) (config file, cache
directory, etc.) would be shared across different subdirectories. This forces
all of them to use the same DVC settings and
- DVC [internals](/doc/user-guide/dvc-internals) (config file, cache directory,
etc.) would be shared across different subdirectories. This forces all of them
to use the same DVC settings and
[remote storage](/doc/command-reference/remote).

- By default, DVC commands like `dvc pull` and `dvc repro` explore the whole
Expand Down
22 changes: 10 additions & 12 deletions content/docs/command-reference/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ etc.) doesn't have DVC initialized (no `.dvc/` directory present).
Namely:

**Checkout**: For any commit hash, branch or tag, `git checkout` retrieves the
[DVC-files](/doc/user-guide/dvc-files-and-directories) corresponding to that
version. The project's DVC-files in turn refer to data stored in
<abbr>cache</abbr>, but not necessarily in the <abbr>workspace</abbr>. Normally,
it would be necessary to use `dvc checkout` to update the workspace accordingly.
[DVC files](/doc/user-guide/dvc-files) corresponding to that version. The
project's DVC-files in turn refer to data stored in <abbr>cache</abbr>, but not
necessarily in the <abbr>workspace</abbr>. Normally, it would be necessary to
use `dvc checkout` to update the workspace accordingly.

This hook automates `dvc checkout` after `git checkout`.

Expand Down Expand Up @@ -168,10 +168,9 @@ $ dvc pull --all-branches --all-tags
## Example: Checkout both Git and DVC

Switching from one Git commit to another (with `git checkout`) may change the
set of [DVC-files](/doc/user-guide/dvc-files-and-directories) in the
<abbr>workspace</abbr>. This would mean that the currently present data files
and directories no longer matches project's version (which can be fixed with
`dvc checkout`).
set of [DVC files](/doc/user-guide/dvc-files) in the <abbr>workspace</abbr>.
This would mean that the currently present data files and directories no longer
matches project's version (which can be fixed with `dvc checkout`).

Let's first list the available tags in the _Get Started_ repo:

Expand Down Expand Up @@ -220,10 +219,9 @@ We also see that the first `dvc status` tells us about differences between the
project's <abbr>cache</abbr> and the data files currently in the workspace. Git
changed the DVC-files in the workspace, which changed references to data files.
`dvc status` first informed us that the data files in the workspace no longer
matched the hash values in the corresponding `.dvc` and `dvc.lock`
[files](/doc/user-guide/dvc-files-and-directories). Running `dvc checkout` then
brings them up to date, and a second `dvc status` tells us that the data files
now do match the DVC files.
matched the hash values in the corresponding `.dvc` and `dvc.lock` files.
Running `dvc checkout` then brings them up to date, and a second `dvc status`
tells us that the data files now do match the DVC files.

```dvc
$ git checkout master
Expand Down
11 changes: 5 additions & 6 deletions content/docs/command-reference/list.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,11 @@ positional arguments:
## Description

A side-effect of DVC is that it hides actual data paths, by effectively
replacing files and directories with
[metafiles](/doc/user-guide/dvc-files-and-directories). So you don't see data
files/dirs when you browse a <abbr>DVC repository</abbr> on Git hosting (e.g.
GitHub), you just see the `dvc.yaml` and `.dvc` files. This can make it hard to
navigate the project, for example to find files or directories for use with
`dvc get`, `dvc import`, or `dvc.api` functions.
replacing files and directories with [DVC files](/doc/user-guide/dvc-files). So
you don't see data files/dirs when you browse a <abbr>DVC repository</abbr> on
Git hosting (e.g. GitHub), you just see the `dvc.yaml` and `.dvc` files. This
can make it hard to navigate the project, for example to find files or
directories for use with `dvc get`, `dvc import`, or `dvc.api` functions.

This command produces a view of a DVC repository, as if files and directories
tracked by DVC were found directly in the Git repo. Its output is equivalent to
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/metrics/show.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ positional arguments:
## Description

Finds and prints all metrics in the <abbr>project</abbr> by examining all of its
[DVC-files](/doc/user-guide/dvc-files-and-directories).
[DVC files](/doc/user-guide/dvc-files).

If `targets` are provided, it will show those specific metrics files instead.
With the `-a` or `-T` options, this command shows the different metrics values
Expand Down
3 changes: 1 addition & 2 deletions content/docs/command-reference/remote/modify.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,7 @@ The following config options are available for all remote types:
DVC will recalculate the file hashes upon download (e.g. `dvc pull`) to make
sure that these haven't been modified, or corrupted during download. It may
slow down the aforementioned commands. The calculated hash is compared to the
value saved in the corresponding
[DVC-file](/doc/user-guide/dvc-files-and-directories).
value saved in the corresponding [DVC file](/doc/user-guide/dvc-files).

> Note that this option is enabled on **Google Drive** remotes by default.

Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/status.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ It accepts paths to tracked files or directories (including paths inside tracked
directories), `.dvc` files, and stage names (found in `dvc.yaml`).

The `--all-branches`, `--all-tags`, and `--all-commits` options enable comparing
[metafiles](/doc/user-guide/dvc-files-and-directories) referenced in multiple
Git commits at once.
[DVC files](/doc/user-guide/dvc-files) referenced in multiple Git commits at
once.

If no differences are detected, `dvc status` prints
`Data and pipelines are up to date.` or
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/update.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Importing 'model.pkl ([email protected]:iterative/example-get-started)'
```

As DVC mentions, the import stage (`.dvc` file) `model.pkl.dvc` is created. This
[stage file](/doc/command-reference/run) is frozen by default though, so to
[stage](/doc/command-reference/run) is frozen by default though, so to
[reproduce](/doc/command-reference/repro) it, we would need to run
`dvc unfreeze` on it first, then `dvc repro` (and `dvc freeze` again). Let's
just run `dvc update` on it instead:
Expand Down Expand Up @@ -103,7 +103,7 @@ Importing 'model.pkl ([email protected]:iterative/example-get-started)'
```

After this, the import stage (`.dvc` file) `model.pkl.dvc` is created. Let's try
to run `dvc update` on the given stage file, and see what happens.
to run `dvc update` on this file and see what happens.

```dvc
$ dvc update model.pkl.dvc
Expand Down
4 changes: 2 additions & 2 deletions content/docs/install/plugins.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# IDE Plugins and Syntax Highlighting

When files or directories are added to the project, or stages to a pipeline,
[DVC metafiles](/doc/user-guide/dvc-files-and-directories) are created. These
use a simple YAML format.
[DVC files](/doc/user-guide/dvc-files) are created. These use a simple YAML
format.

We maintain a [schema](https://github.com/iterative/dvcyaml-schema) for
`dvc.yaml` that can enable IDE syntax checks and auto-completion.
Expand Down
9 changes: 4 additions & 5 deletions content/docs/start/data-pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,10 +120,9 @@ prepare:
```

- The last line, `python src/prepare.py ...`, is the command to run in this
stage, and it's saved to the stage file, as shown below.
stage, and it's saved to `dvc.yaml`, as shown below.

The resulting `prepare` stage in the `dvc.yaml` contains all of the information
above:
The resulting `prepare` stage contains all of the information above:

```yaml
stages:
Expand All @@ -145,7 +144,7 @@ There's no need to use `dvc add` for DVC to track stage outputs (`data/prepared`
in this case); `dvc run` already took care of this. You only need to run
`dvc push` if you want to save them to
[remote storage](/doc/tutorials/get-started/data-versioning#storing-and-sharing),
(usually along with `git commit` to version the stage file itself).
(usually along with `git commit` to version `dvc.yaml` itself).

## Dependency graphs (DAGs)

Expand Down Expand Up @@ -318,7 +317,7 @@ important problems:
## Visualize

Having built our pipeline, we need a good way to understand its structure.
Seeing a graph of connected stage files would help. DVC lets you do just that,
Seeing a graph of connected stages would help. DVC lets you do just that,
without leaving the terminal!

```dvc
Expand Down
6 changes: 3 additions & 3 deletions content/docs/start/data-versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ $ dvc add data/data.xml

DVC stores information about the added file (or a directory) in a special `.dvc`
file named `data/data.xml.dvc`, a small text file with a human-readable
[format](/doc/user-guide/dvc-files-and-directories#dvc-files). This file can be
easily versioned like source code with Git, as a placeholder for the original
data (which gets listed in `.gitignore`):
[format](/doc/user-guide/dvc-files#dvc-files). This file can be easily versioned
like source code with Git, as a placeholder for the original data (which gets
listed in `.gitignore`):

```dvc
$ git add data/data.xml.dvc data/.gitignore
Expand Down
2 changes: 1 addition & 1 deletion content/docs/use-cases/data-registries.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ $ git commit -am "Add 1,000 more songs to music/ dataset."

Iterating on this process for several datasets can give shape to a robust
registry. The result is basically a repo that versions a set of
[metafiles](/doc/user-guide/dvc-files-and-directories). Let's see an example:
[metafiles](/doc/user-guide/dvc-files). Let's see an example:

```dvc
$ tree --filelimit=10
Expand Down
4 changes: 2 additions & 2 deletions content/docs/use-cases/sharing-data-and-model-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,8 @@ with the `dvc push` command:
$ dvc push
```

Code and [DVC-files](/doc/user-guide/dvc-files-and-directories) can be safely
committed and pushed with Git.
Code and [DVC files](/doc/user-guide/dvc-files) can be safely committed and
pushed with Git.

## Download code

Expand Down
11 changes: 5 additions & 6 deletions content/docs/use-cases/versioning-data-and-model-files/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,11 @@ work!
![](/img/project-versions.png) _DVC matches the right versions of data, code,
and models for you 💘._

DVC enables data _versioning through codification_. You write simple
[metafiles](/doc/user-guide/dvc-files-and-directories) once, describing what
datasets, ML artifacts, etc. to track. This metadata can be put in Git in lieu
of large files. Now you can use DVC to create
[snapshots](/doc/command-reference/add) of the data,
[restore](/doc/command-reference/checkout) previous versions,
DVC enables data _versioning through codification_. You produce simple
[metafiles](/doc/user-guide/dvc-files) once, describing what datasets, ML
artifacts, etc. to track. This metadata can be put in Git in lieu of large
files. Now you can use DVC to create [snapshots](/doc/command-reference/add) of
the data, [restore](/doc/command-reference/checkout) previous versions,
[reproduce](/doc/command-reference/repro) experiments, record evolving
[metrics](/doc/command-reference/metrics), and more!

Expand Down
22 changes: 10 additions & 12 deletions content/docs/use-cases/versioning-data-and-model-files/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,8 @@ $ pip install -r requirements.txt
The repository you cloned is already DVC-initialized. It already contains a
`.dvc/` directory with the `config` and `.gitignore` files. These and other
files and directories are hidden from user, as typically there's no need to
interact with them directly. See
[DVC Files and Directories](/doc/user-guide/dvc-files-and-directories) to learn
more.
interact with them directly. See [DVC Internals](/doc/user-guide/dvc-internals)
to learn more.

</details>

Expand Down Expand Up @@ -137,9 +136,8 @@ intermediate results, etc. It tells Git to ignore the directory and puts it into
the <abbr>cache</abbr> (while keeping a
[file link](/doc/user-guide/large-dataset-optimization#file-link-types-for-the-dvc-cache)
to it in the <abbr>workspace</abbr>, so you can continue working the same way as
before). This is achieved by creating a simple human-readable
[DVC-file](/doc/user-guide/dvc-files-and-directories) that serves as a pointer
to the cache.
before). This is achieved by creating a simple human-readable `.dvc` file that
serves as a pointer to the cache.

Next, we train our first model with `train.py`. Because of the small dataset,
this training process should be small enough to run on most computers in a
Expand Down Expand Up @@ -174,8 +172,8 @@ As we mentioned briefly, DVC does not commit the `data/` directory and
then `git commit` DVC-files that contain file hashes that point to cached data.

In this case we created `data.dvc` and `model.h5.dvc`. Refer to
[DVC Files](/doc/user-guide/dvc-files-and-directories) to learn more about how
these files work.
[DVC Files](/doc/user-guide/dvc-files#dvc-files) to learn more about how these
files work.

</details>

Expand Down Expand Up @@ -284,14 +282,14 @@ the `v2.0` tag.

<details>

### Expand to learn more about DVC internals
### Expand to learn more about DVC files

As we have learned already, DVC keeps data files out of Git (by adjusting
`.gitignore`) and puts them into the <abbr>cache</abbr> (usually it's a
`.dvc/cache` directory inside the repository). Instead, DVC creates
[DVC-files](/doc/user-guide/dvc-files-and-directories). These text files serve
as data placeholders that point to the cached files, and they can be easily
version controlled with Git.
[DVC files](/doc/user-guide/dvc-files). These text files serve as data
placeholders that point to the cached files, and they can be easily version
controlled with Git.

When we run `git checkout` we restore pointers (DVC-files) first. Then, when we
run `dvc checkout`, we use these pointers to put the right data in the right
Expand Down
5 changes: 2 additions & 3 deletions content/docs/user-guide/basic-concepts/dvc-project.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,5 @@ match:

Initialized by running `dvc init` in the **workspace** (typically a Git
repository). It will contain the
[`.dvc/` directory](/doc/user-guide/dvc-files-and-directories), as well as
`dvc.yaml` and `.dvc` files created with commands such as `dvc add` or
`dvc run`.
[`.dvc/` directory](/doc/user-guide/dvc-internals), as well as `dvc.yaml` and
`.dvc` files created with commands such as `dvc add` or `dvc run`.
4 changes: 2 additions & 2 deletions content/docs/user-guide/contributing/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,8 @@ is installed when `yarn` runs (see [dev env](#development-environment)).
`dvc`, `yaml`, or `diff` custom languages. `usage` is employed to show the
`dvc --help` output for each command reference. `dvc` can be used to show
examples of commands and their output in a terminal session. `yaml` is used to
show [DVC-file](/doc/user-guide/dvc-files-and-directories) contents or other
YAML data. `diff` is used mainly for examples of `git diff` output.
show [DVC file](/doc/user-guide/dvc-files) contents, or other YAML data.
`diff` is used mainly for examples of `git diff` output.

> Check out the `.md` source code of any command reference to get a better idea,
> for example in
Expand Down
5 changes: 2 additions & 3 deletions content/docs/user-guide/dvc-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,8 +218,7 @@ It's created or updated by DVC commands such as `dvc run` and `dvc repro`.
- `dvc.lock` is needed internally for several DVC commands to operate, such as
`dvc checkout`, `dvc get`, and `dvc import`.

Here's an example `dvc.lock` based on the one in [`dvc.yaml`](#dvcyaml-file)
above:
Here's an example `dvc.lock` based on the one in `dvc.yaml` above:

```yaml
stages:
Expand Down Expand Up @@ -248,7 +247,7 @@ Regular <abbr>dependencies</abbr> and all kinds of <abbr>outputs</abbr>
[plots](/doc/command-reference/plots) files) are also listed (per stage) in
`dvc.lock`, but with an additional field to store the hash value of each file or
directory tracked by DVC. Specifically: `md5`, `etag`, or `checksum` (same as in
`deps` and `outs` entries of [`.dvc` files](#dvc-files)).
`deps` and `outs` entries of `.dvc` files).

Full <abbr>parameters</abbr> (key and value) are listed separately under
`params`, grouped by parameters file.
Loading