Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DVCLive: Revisit docs organization and content #3923

Merged
merged 34 commits into from
Oct 19, 2022
Merged

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Sep 5, 2022

Closes iterative/dvclive#273
Closes iterative/dvclive#289

Try to focus on the simplest case: Use existing ML integration alongside DVC.

The previous state jumped directly into the API overview and addressed DVC integration on a separate page.
Try to better reflect the high-level steps required to get everything running in the most common case.

@daavoo daavoo added the C: dvclive Content of /doc/dvclive label Sep 5, 2022
@daavoo daavoo self-assigned this Sep 5, 2022
@shcheklein shcheklein temporarily deployed to dvc-org-dvclive-refacto-czwjzz September 5, 2022 18:18 Inactive
@github-actions

This comment was marked as outdated.

@shcheklein shcheklein temporarily deployed to dvc-org-dvclive-refacto-czwjzz September 5, 2022 18:33 Inactive
@shcheklein
Copy link
Member

I think ppl will start with an existing framework (not an index page of the API). We should be prepared for that as much as we can. Let's say I use Keras and I start from the Keras integration page - how many other pages I'll have to read to get to a working project that I can understand? That's the metric we should be optimizing in the docs here.

Probably, it means that we can even introduce some duplication and / or clear 1-2-3 where 2 and 3 are links to some general pages (dvc.yaml , code sinppet with DVCLive, etc).

@daavoo
Copy link
Contributor Author

daavoo commented Sep 6, 2022

I think ppl will start with an existing framework (not an index page of the API). We should be prepared for that as much as we can. Let's say I use Keras and I start from the Keras integration page - how many other pages I'll have to read to get to a working project that I can understand? That's the metric we should be optimizing in the docs here

I was thinking people start from Get Started, but will try to also optimize for this case.

Probably, it means that we can even introduce some duplication and / or clear 1-2-3 where 2 and 3 are links to some general pages (dvc.yaml , code sinppet with DVCLive, etc).

I updated Get Started and moved the old one to API Reference. I tried to make the current Get Started a step-by-step with links to more details. Do you think the step-by-step should also be present on each ML Framework page?

@shcheklein
Copy link
Member

so, concept of loggers is more or less clear and familiar I guess, that's why last time I was trying I went right to the Keras page ... I expected to see a simple copy-paste to get started

Get started itself is good for beginners I think in this case.

Do you think the step-by-step should also be present on each ML Framework page?

I think it would benefit a lot if we make those pages self-containable + easy way to copy paste w/o jumping around and get decent results.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Sep 7, 2022

I think ppl will start with an existing framework

I was thinking people start from Get Started

Here are top entry pages for DVCLive now:

image

Looks like frameworks and DVC integration trump the Get Started quite a bit. And for the DVCLive docs home page (top entry page) most traffic comes from Google (mainly direct tool name searches):

image

Top queries:

image

It's also interesting that / is listed among top entry pages (first figure). Maybe we should add a direct link to DVCLive in the site home page? If so let's comment in #3833

Should've probably discussed and planned in iterative/dvclive#273 first (sorry I didn't notice it earlier) but let's check this PR since we have it!

Comment on lines 3 to 7
When using [DVC Checkpoints](/doc/user-guide/experiment-management/checkpoints)
and/or enabling DVCLive's [`resume`](/doc/dvclive/api-reference/live#parameters)
you need to add the flag
[`persist: true`](/doc/user-guide/project-structure/pipelines-files#output-subfields)
to all DVCLive outputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably needs more motivation (why do this/ how is it useful) closer to what we have now in https://dvc.org/doc/dvclive/dvclive-with-dvc#checkpoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to make this a concise how-to page. I would expect the first linked page to user-guide/experiment-management/checkpoints to cover the motivation for checkpoints and people coming to this page for specific guidance on proper setup, not looking for motivation on the feature

Comment on lines 9 to 13
Adding `--type checkpoint` to `dvc exp init` will take care of doing this when
generating the `dvc.yaml`:

```dvc
$ dvc exp init \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the recommended "happy path" method? The intro above is already a bit complicated conceptually (mentioning checkpoints and output persistence), so introducing also experiments isn't ideal. If possible move it after the YAML sample as a tip/ alternative easy way to get there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the order. As commented above the scope here was a concise explanation on how to set up for resume training, not to introduce the concepts to dvclive users

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my feedback. Sorry I missed the updates until now...

Comment on lines -1 to -11
# DVCLive with DVC

Even though DVCLive does not require DVC, they can integrate in a couple useful
ways:

- The [outputs](#outputs) DVCLive produces are recognized by `dvc exp`,
`dvc metrics` and `dvc plots`. Those same outputs can be visualized in
[Iterative Studio](#iterative-studio).

- DVCLive is also capable of generating [checkpoint](#checkpoints) signal files
used by DVC <abbr>experiments<abbr>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This summary was nice. If we remove it, should we mention these things more in DVC docs and link to the new DVCLive how-tos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the first bullet point is now covered in the get started https://dvc-org-dvclive-refacto-czwjzz.herokuapp.com/doc/dvclive/get-started#dvc

Regarding the second bullet point, there is a detailed page about checkpoints in DVC user guide and it already mentions/points to DVCLive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @jorgeorpinel that I liked having a summary of all the DVC magic that DVCLive adds. Otherwise, it feels like readers aren't quite sure what DVCLive is doing.

Comment on lines -14 to 12
<card href="/doc/dvclive/dvclive-with-dvc" heading="DVCLive with DVC">
Discover how DVCLive and DVC can integrate in several useful ways
</card>

<card href="/doc/dvclive/ml-frameworks" heading="ML Frameworks">
Use DVCLive alongside your favorite ML Framework
A step-by-step introduction
</card>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get why we removed these 2 from the docs home page if that's what we want to focus on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that dvclive-with-dvc is now covered in Get Started and we want to drive users to Get Started as much as possible to clarify the workflow.

Comment on lines 93 to 96
## Outputs

After you run your training code, you should see the following content in the
project:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was never sure about the usage of term "output" here. It can be confusing in the content of DVC (stage outs). Maybe "project structure" or "resulting files" or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to Output folder structure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better, thanks. I still would remove "output", especially since the term is not even used in the contents (other than the title).

@jorgeorpinel jorgeorpinel added the A: docs Area: user documentation (gatsby-theme-iterative) label Sep 8, 2022
Copy link
Collaborator

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- only minor comments this time. Once those are addressed, I think it's mergeable.

Co-authored-by: Dave Berenbaum <[email protected]>
@shcheklein shcheklein temporarily deployed to dvc-org-dvclive-refacto-czwjzz October 18, 2022 11:03 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-dvclive-refacto-czwjzz October 18, 2022 11:03 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-dvclive-refacto-czwjzz October 18, 2022 11:09 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-dvclive-refacto-czwjzz October 18, 2022 11:20 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-dvclive-refacto-czwjzz October 18, 2022 11:31 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-dvclive-refacto-czwjzz October 18, 2022 11:35 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-dvclive-refacto-czwjzz October 18, 2022 11:44 Inactive
Copy link
Collaborator

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @daavoo! Nice improvement here.

@dberenbaum dberenbaum merged commit 5bf3d8e into main Oct 19, 2022
@dberenbaum dberenbaum deleted the dvclive-refactor branch October 19, 2022 17:00
Comment on lines +95 to +96
"^/doc/dvclive/dvclive-with-dvc$ /doc/dvclive/get-started",
"^/doc/dvclive/ml-frameworks$ /doc/dvclive/api-reference/ml-frameworks",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update on #3923 (comment) (relates to the redirects above):

image

Looks like frameworks and DVC integration trump the Get Started quite a bit.

Traffic in general is up, esp. to the DVClive's home page and somewhat to the Get Started (👍🏼). But he ML frameworks and DVC integration pages have less traffic now 🤷🏼

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. I just realized it's only been a week since this was merged so let's check again later.

Comment on lines +12 to +13
<toggle>
<tab title="Keras">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a criteria to the tabs we include here and their order? Just curious...

image

E.g. rn the top 4 with most traffic are: Pytorch, HugginFace, Tensorflow, Keras (in that order).


<toggle>
<tab title="Scalars">
<tab title="Python API">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly unclear what this means: 1) All the samples are Python; 2) There's no "Python API" under ML Frameworks. Maybe link to https://dvc.org/doc/dvclive/api-reference/live for this one? Currently they all have the same comment under:

Check the ML Frameworks page for more details and other supported frameworks.

Learn more in the
[Comparing Experiments](/doc/user-guide/experiment-management/comparing-experiments)
and [Visualizing Plots](/doc/user-guide/experiment-management/visualizing-plots)
pages of the user guide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💅🏼

Suggested change
pages of the user guide.
pages.


## What next?
### Share Results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be an H2 to me (since it's more about Studio than the DVC integration). This would also create an entry in the right side CONTENTS nav.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: dvclive Content of /doc/dvclive
Projects
None yet
5 participants