From dc003c1511ba58305f6fe8666f09e104a1cc373b Mon Sep 17 00:00:00 2001 From: imhardikj Date: Sat, 18 Jul 2020 22:22:03 +0530 Subject: [PATCH 1/5] removed Dvcfile --- content/docs/command-reference/push.md | 52 ++++++++++++++++---------- 1 file changed, 33 insertions(+), 19 deletions(-) diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index e5b1aacf15..ba51b1b7f0 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -155,14 +155,28 @@ a [pipeline](/doc/command-reference/pipeline) has been setup with these [stages](/doc/command-reference/run): ```dvc -$ dvc pipeline show -data/Posts.xml.zip.dvc -Posts.xml.dvc -Posts.tsv.dvc -Posts-test.tsv.dvc -matrix-train.p.dvc -model.p.dvc -Dvcfile +$ dvc dag + +-------------+ + | prepare.dvc | + +-------------+ + * + * + * + +---------------+ + | featurize.dvc | + +---------------+ + ** ** + ** ** + ** ** ++-----------+ ** +| train.dvc | ** ++-----------+ ** + ** ** + ** ** + ** ** + +--------------+ + | evaluate.dvc | + +--------------+ ``` Imagine the projects has been modified such that the @@ -170,22 +184,22 @@ Imagine the projects has been modified such that the [remote storage](/doc/command-reference/remote). ```dvc -$ dvc status --cloud - - new: data/model.p - new: data/matrix-test.p - new: data/matrix-train.p +$ dvc status -c + deleted: data/features/test.pkl + deleted: data/features/train.pkl + deleted: model.pkl + ... ``` One could do a simple `dvc push` to share all the data, but what if you only want to upload part of the data? ```dvc -$ dvc push --with-deps matrix-train.p.dvc +$ dvc push --with-deps featurize.dvc ... Do some work based on the partial update -$ dvc push --with-deps model.p.dvc +$ dvc push --with-deps evaluate.dvc ... Push the rest of the data @@ -194,11 +208,11 @@ $ dvc status --cloud Data and pipelines are up to date. ``` -We specified a stage in the middle of this pipeline (`matrix-train.p.dvc`) with -the first push. `--with-deps` caused DVC to start with that `.dvc` file, and -search backwards through the pipeline for data files to upload. +We specified a stage in the middle of this pipeline (`featurize.dvc`) with the +first push. `--with-deps` caused DVC to start with that `.dvc` file, and search +backwards through the pipeline for data files to upload. -Because the `model.p.dvc` stage occurs later (it's the last one), its data was +Because the `evaluate.dvc` stage occurs later (it's the last one), its data was not pushed. However, we then specified it in the second push, so all remaining data was uploaded. From ec42aa9f89483f764f32edcfcff85d78e07eaf5f Mon Sep 17 00:00:00 2001 From: imhardikj Date: Sun, 19 Jul 2020 04:01:30 +0530 Subject: [PATCH 2/5] updates --- content/docs/command-reference/push.md | 66 +++++++++++++------------- 1 file changed, 33 insertions(+), 33 deletions(-) diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index ba51b1b7f0..792944cc27 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -156,27 +156,27 @@ a [pipeline](/doc/command-reference/pipeline) has been setup with these ```dvc $ dvc dag - +-------------+ - | prepare.dvc | - +-------------+ - * - * - * - +---------------+ - | featurize.dvc | - +---------------+ - ** ** - ** ** - ** ** -+-----------+ ** -| train.dvc | ** -+-----------+ ** - ** ** - ** ** - ** ** - +--------------+ - | evaluate.dvc | - +--------------+ + +---------+ + | prepare | + +---------+ + * + * + * + +-----------+ + | featurize | + +-----------+ + ** ** + ** * + * ** ++-------+ * +| train | ** ++-------+ * + ** ** + ** ** + * * + +----------+ + | evaluate | + +----------+ ``` Imagine the projects has been modified such that the @@ -185,21 +185,21 @@ Imagine the projects has been modified such that the ```dvc $ dvc status -c - deleted: data/features/test.pkl - deleted: data/features/train.pkl - deleted: model.pkl - ... + new: data/featurize/train.pkl + new: data/featurize/train.pkl + new: data/prepared/train.tsv + new: data/prepared/test.tsv ``` One could do a simple `dvc push` to share all the data, but what if you only want to upload part of the data? ```dvc -$ dvc push --with-deps featurize.dvc +$ dvc push --with-deps featurize ... Do some work based on the partial update -$ dvc push --with-deps evaluate.dvc +$ dvc push --with-deps evaluate ... Push the rest of the data @@ -208,13 +208,13 @@ $ dvc status --cloud Data and pipelines are up to date. ``` -We specified a stage in the middle of this pipeline (`featurize.dvc`) with the -first push. `--with-deps` caused DVC to start with that `.dvc` file, and search -backwards through the pipeline for data files to upload. +We specified a stage in the middle of this pipeline (`featurize`) with the first +push. `--with-deps` caused DVC to start with this stage, and search backwards +through the pipeline for data files to upload. -Because the `evaluate.dvc` stage occurs later (it's the last one), its data was -not pushed. However, we then specified it in the second push, so all remaining -data was uploaded. +Because the `evaluate` stage occurs later (it's the last one), its data was not +pushed. However, we then specified it in the second push, so all remaining data +was uploaded. Finally, we used `dvc status` to double check that all data had been uploaded. From db08e42997734fcc38bf9655eab34f48ce41d655 Mon Sep 17 00:00:00 2001 From: imhardikj Date: Wed, 22 Jul 2020 01:46:22 +0530 Subject: [PATCH 3/5] dvc dag use --- content/docs/command-reference/push.md | 73 ++++++++++++++------------ 1 file changed, 40 insertions(+), 33 deletions(-) diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index 792944cc27..2a1e61c625 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -156,27 +156,34 @@ a [pipeline](/doc/command-reference/pipeline) has been setup with these ```dvc $ dvc dag - +---------+ - | prepare | - +---------+ - * - * - * - +-----------+ - | featurize | - +-----------+ - ** ** - ** * - * ** -+-------+ * -| train | ** -+-------+ * - ** ** - ** ** - * * - +----------+ - | evaluate | - +----------+ + +------------------------+ + | data/Posts.xml.zip.dvc | + +------------------------+ + * + * + * + +-----------+ + | Posts-xml | + +-----------+ + * + * + * + +-----------+ + | Posts-tsv | + +-----------+ + *** *** + ** *** + ** ** ++----------------+ ** +| Posts-test-tsv | ** ++----------------+ *** + *** *** + ** ** + ** ** + +--------------+ + | matrix-train | + +--------------+ + ``` Imagine the projects has been modified such that the @@ -184,22 +191,22 @@ Imagine the projects has been modified such that the [remote storage](/doc/command-reference/remote). ```dvc -$ dvc status -c - new: data/featurize/train.pkl - new: data/featurize/train.pkl - new: data/prepared/train.tsv - new: data/prepared/test.tsv +$ dvc status --cloud + + new: data/model.p + new: data/matrix-test.p + new: data/matrix-train.p ``` One could do a simple `dvc push` to share all the data, but what if you only want to upload part of the data? ```dvc -$ dvc push --with-deps featurize +$ dvc push --with-deps Posts-tsv ... Do some work based on the partial update -$ dvc push --with-deps evaluate +$ dvc push --with-deps matrix-train ... Push the rest of the data @@ -208,13 +215,13 @@ $ dvc status --cloud Data and pipelines are up to date. ``` -We specified a stage in the middle of this pipeline (`featurize`) with the first -push. `--with-deps` caused DVC to start with this stage, and search backwards +We specified a stage in the middle of this pipeline (`Posts-tsv`) with the first +push. `--with-deps` caused DVC to start with that stage, and search backwards through the pipeline for data files to upload. -Because the `evaluate` stage occurs later (it's the last one), its data was not -pushed. However, we then specified it in the second push, so all remaining data -was uploaded. +Because the `matrix-train` stage occurs later (it's the last one), its data was +not pushed. However, we then specified it in the second push, so all remaining +data was uploaded. Finally, we used `dvc status` to double check that all data had been uploaded. From 647db554efa71d1e9848385eab4cc686bcab9654 Mon Sep 17 00:00:00 2001 From: imhardikj Date: Wed, 22 Jul 2020 01:54:58 +0530 Subject: [PATCH 4/5] cat dvc.yaml --- content/docs/command-reference/push.md | 63 ++++++++++++++------------ 1 file changed, 34 insertions(+), 29 deletions(-) diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index 2a1e61c625..494e038b5c 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -155,35 +155,40 @@ a [pipeline](/doc/command-reference/pipeline) has been setup with these [stages](/doc/command-reference/run): ```dvc -$ dvc dag - +------------------------+ - | data/Posts.xml.zip.dvc | - +------------------------+ - * - * - * - +-----------+ - | Posts-xml | - +-----------+ - * - * - * - +-----------+ - | Posts-tsv | - +-----------+ - *** *** - ** *** - ** ** -+----------------+ ** -| Posts-test-tsv | ** -+----------------+ *** - *** *** - ** ** - ** ** - +--------------+ - | matrix-train | - +--------------+ - +$ cat dvc.yaml +stages: + Posts-xml: + cmd: unzip data/Posts.xml.zip -d data/ + deps: + - data/Posts.xml.zip + outs: + - data/Posts.xml + Posts-tsv: + cmd: python3 src/Posts.py + deps: + - data/Posts.xml + - src/Posts.py + outs: + - Posts.tsv + Posts-test-tsv: + cmd: python3 src/Posts-test.py + deps: + - Posts.tsv + - src/Posts-test.py + outs: + - Posts-test.tsv + matrix-train: + cmd: python3 src/matrix-train.py + deps: + - Posts-test.tsv + - Posts.tsv + - src/matrix-train.py + outs: + - data/matrix-test.p + - data/matrix-train.p + - data/model.p + params: + - matrix-train.train ``` Imagine the projects has been modified such that the From 8d26783c6a73ade596c8744b3b934776c1128072 Mon Sep 17 00:00:00 2001 From: imhardikj Date: Wed, 22 Jul 2020 19:07:31 +0530 Subject: [PATCH 5/5] updates --- content/docs/command-reference/push.md | 50 +++++++------------------- 1 file changed, 12 insertions(+), 38 deletions(-) diff --git a/content/docs/command-reference/push.md b/content/docs/command-reference/push.md index 494e038b5c..da310c24b0 100644 --- a/content/docs/command-reference/push.md +++ b/content/docs/command-reference/push.md @@ -155,40 +155,14 @@ a [pipeline](/doc/command-reference/pipeline) has been setup with these [stages](/doc/command-reference/run): ```dvc -$ cat dvc.yaml -stages: - Posts-xml: - cmd: unzip data/Posts.xml.zip -d data/ - deps: - - data/Posts.xml.zip - outs: - - data/Posts.xml - Posts-tsv: - cmd: python3 src/Posts.py - deps: - - data/Posts.xml - - src/Posts.py - outs: - - Posts.tsv - Posts-test-tsv: - cmd: python3 src/Posts-test.py - deps: - - Posts.tsv - - src/Posts-test.py - outs: - - Posts-test.tsv - matrix-train: - cmd: python3 src/matrix-train.py - deps: - - Posts-test.tsv - - Posts.tsv - - src/matrix-train.py - outs: - - data/matrix-test.p - - data/matrix-train.p - - data/model.p - params: - - matrix-train.train +$ dvc pipeline show +data/Posts.xml.zip.dvc +Posts.xml.dvc +Posts.tsv.dvc +Posts-test.tsv.dvc +matrix-train.p.dvc +model.p.dvc +Dvcfile ``` Imagine the projects has been modified such that the @@ -207,7 +181,7 @@ One could do a simple `dvc push` to share all the data, but what if you only want to upload part of the data? ```dvc -$ dvc push --with-deps Posts-tsv +$ dvc push --with-deps test-posts ... Do some work based on the partial update @@ -220,9 +194,9 @@ $ dvc status --cloud Data and pipelines are up to date. ``` -We specified a stage in the middle of this pipeline (`Posts-tsv`) with the first -push. `--with-deps` caused DVC to start with that stage, and search backwards -through the pipeline for data files to upload. +We specified a stage in the middle of this pipeline (`test-posts`) with the +first push. `--with-deps` caused DVC to start with that `.dvc` file, and search +backwards through the pipeline for data files to upload. Because the `matrix-train` stage occurs later (it's the last one), its data was not pushed. However, we then specified it in the second push, so all remaining