Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get started - "top funnel" changes #4460

Merged
merged 10 commits into from
Apr 13, 2023
Merged

Get started - "top funnel" changes #4460

merged 10 commits into from
Apr 13, 2023

Conversation

omesser
Copy link
Contributor

@omesser omesser commented Apr 10, 2023

You may disregard these recommendations if you used the Edit on GitHub button from dvc.org to improve a doc in place.

❗ Please read the guidelines in the Contributing to the Documentation list if you make any substantial changes to the documentation or JS engine.

🐛 Please make sure to mention Fix #issue (if applicable) in the description of the PR. This causes GitHub to close it automatically when the PR is merged.

Please choose to allow us to edit your branch when creating the PR.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

motivation and explanation

I'm issuing this PR as a start for some of the changes I had in mind.
I still believe our get started is complex, bloated and intimidating and this PR doesn't solve all of this. But wanted to keep the changes somewhat iterative to allow efficient discussion and and cut scope per change in case my ideas of a better get-started guide don't align with others.

in this PR (HL changes)

  • Get start index page got re-shuffled a bit. "Following This Guide" welcomes the reader and explains bout the scenarios/trails/tracks before the users "have to make choices" to try and guide them through this
  • The "trails" are now consistently "data management" and "experiment management". Language was adjusted so "management" is a high-level term, and the individual chapters detail out what does it mean to "manage" data / experiments.
  • some chapters reordered or unified or renamed. Now there are 3 chapters per trail;
    • data and model access - was moved away from get-started to user-guide/data-management, and renamed to avoid confusion with "data access" which has different meaning in SWE than what we refer to here.
    • experiment versioning is now live-tracking -> experiment tracking
    • experiment management is now experiment-collaboration -> collaborating on experiments
    • experiment iterations content was appended to experimenting using pipelines

out of scope - in followup PRs

condensing, cutting some content / moving it to how-tos or use-cases section. Possibly cleaning up some distractions (tons of info/tip admons, side tracking the reader and links to other places harming a linear experience of going through a get-started tutorial.

  • Re-thinking / focusing experiment-pipeline page to not mainly repeat data-pipelines
  • Condensing the index page - only mentioning the first of each "trail" - try condensing the "trails" to single page

open questions I still have in mind

  • data: metrics-paramters-and-plots - should this be in experiments?
  • pipelines - there's some ambiguity by "data pipelines" and "experiments" in DVC. actually experiments are persisted pipeline executions and I'm not sure it's clear enough. Also not sure it's easily digestable.

cc @daavoo @shcheklein and others for reviews and opinions.

Check out the review app here: https://dvc-org-get-started-top-nr5wyh.herokuapp.com/doc/start

@omesser omesser requested a review from a team as a code owner April 10, 2023 15:07
@shcheklein shcheklein temporarily deployed to dvc-org-get-started-top-nr5wyh April 10, 2023 15:10 Inactive
@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 10, 2023 15:17 Inactive
@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 10, 2023 15:18 Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Apr 10, 2023

Link Check Report

There were no links to check!

@shcheklein

This comment was marked as resolved.

@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 10, 2023 17:39 Inactive
@omesser

This comment was marked as resolved.

@omesser omesser requested a review from shcheklein April 10, 2023 17:43
@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 10, 2023 17:50 Inactive
Copy link
Collaborator

@dberenbaum dberenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @omesser! I like that it makes meaningful changes without doing so much that it's overwhelming.

Thoughts on the questions you listed:

data: metrics-paramters-and-plots - should this be in experiments?

It feels like we are caught in between introducing experiments at the end of the data management section and leaving them out. We should either add dvc exp run here or leave this page out completely.

pipelines - there's some ambiguity by "data pipelines" and "experiments" in DVC. actually experiments are persisted pipeline executions and I'm not sure it's clear enough. Also not sure it's easily digestable.

Yeah, there's duplication, which would be okay except it makes things unclear. It relates to #4375. I think we either need a separate pipelines trail or we need to fully explain what's the same and what's different between the data pipelines and experiment pipelines.

One idea to move this forward:

  1. Consolidate all the pipeline setup at the top of /doc/start/experiments/experiment-pipelines by referring back to /doc/start/data-management/data-pipelines and /doc/start/data-management/metrics-parameters-plots. We can show how to convert at most the first stage in example-get-started-experiments and mention that the rest follow that same data pipelines process.
  2. Follow up after this PR with another attempt to add a separate pipelines trail or otherwise reorganize the pipelines content between the existing trails.

content/docs/start/index.md Show resolved Hide resolved
content/docs/start/experiments/experiment-pipelines.md Outdated Show resolved Hide resolved
Comment on lines +209 to +213
Now that you have a <abbr>DVC Pipeline</abbr> set up, you can easily iterate on
it by running `dvc exp run` to create and track new experiment runs. This
enables some new features in DVC like Queueing experiments, and a canonical way
to work with parameters and hyper-parameters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of adding this all to the experiments-pipelines page, but it feels like way too long to get to this part, which should really be the focus of experiment pipelines (everything else is basically setup).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added similar paragraph to the beginning, and i'll compact and 🔪 this content more in a followup PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up expanding scope a bit and editing this content so we don't hang in / explain too much about the basic pipeline building stuff (add stage), but mention the commands and progress

content/docs/start/experiments/experiment-collaboration.md Outdated Show resolved Hide resolved
content/docs/start/index.md Outdated Show resolved Hide resolved
content/docs/start/index.md Outdated Show resolved Hide resolved
content/docs/start/index.md Outdated Show resolved Hide resolved
@shcheklein
Copy link
Member

One comment re One idea to move this forward: and trails. The whole idea of trails was to make them self-sufficient (you don't have to go into another trail to see that pipeline or exps are a possible evolution of data - we kinda show you the next steps right there, logical to the project and the context you are in).

That's why we have two separate self-sufficient e2e projects, for example.

If we feel that this is too complicated we should consider getting back to a single, simplified version of the get started. A single project, a few pages, etc, etc. The downside of that approach usually is that it's hard to start with both dvc add and dvc exp run and dvc stage add.

@daavoo
Copy link
Contributor

daavoo commented Apr 11, 2023

That's why we have two separate self-sufficient e2e projects, for example.
If we feel that this is too complicated we should consider getting back to a single, simplified version of the get started. A single project, a few pages, etc, etc. The downside of that approach usually is that it's hard to start with both dvc add and dvc exp run and dvc stage add.

As we have been iterating, I am more strongly convinced that we should do isolated Data, Experiments and Pipelines as get-started pages.

Made them about showing specific commands:

  • Data: add/push-pull/remote add.
  • Experiments: DVCLive/exp show/exp branch/exp push-pull.
  • Pipelines: stage add/exp run.

On top of the get-started pages and as an optimal follow-up: A single, end-to-end, reproducible tutorial that builds on the 3 components.
The tutorial doesn't explain any command, assumes get started has been followed, and focuses instead on explaining workflow around the components.

For anything outside that or more details, dispatch to User Guide.


</admon>

## Data Management
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about not mentioning the subpages at this point and just directing the users to the first page of each "trail"?

I think it would :

  • reduce verbosity
  • save users from having to choose where to start
  • save potential confusion/discussion about the subtitles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting idea, but I think It would behave more consistently only if we actually unify data/exp to 1 page each (yes, it would be long and require scrolling.... but that's fine, who said a lot of short pages is better)
Ok if I try this in an individual followup PR and we'll give it a try and get a feel for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we actually unify data/exp to 1 page each (yes, it would be long and require scrolling.... but that's fine, who said a lot of short pages is better)

Consolidating into 1 page makes sense to be.

Tangential discussion (https://iterativeai.slack.com/archives/C020JRZ3UN8/p1681221501055689?thread_ts=1680629942.209259&cid=C020JRZ3UN8) but feels like the design of the page contributes to the "long" feeling.

Ok if I try this in an individual followup PR and we'll give it a try and get a feel for it?

Sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No matter if we consolidate or not (again, we had it before in different ways and forms already :) ) - I agree with @daavoo - tbh not clear why we should mention individual subpages. Makes it too lengthy.

Copy link
Contributor Author

@omesser omesser Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein - I think it is coupled with single/non-single page honestly, because:

  • now chapters are self contained to a sense, you an drop in any of them. Making only 1 chapter an entry point makes the next ones automatically non-entry points and that would require most content changes, it will basically be 1 piece of content per trail, either single page or some wizard with steps (pages), but it would make it so no one would drop on the non-first chapter, so no point in keeping them all self-contained.
  • most people don't notice the next/prev buttons, and navigate by links, so they might not notice and miss those nest pages without proper hooks (press next to read about...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if no strong objections, let's leave this to a followup PR please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong opinion, I see your point, I would still decouple it (from a single page decision, there was no intention to do it this way initially, I'm not sure it's 100% done this way, etc) and simplify the page. If get started starts with a 2-3 screen page just to navigate it - something is wrong with it fundamentally.

To clarify, I think there are many ways to keep even all these links, give ppl about the structure and keep it preferably ~1 screen. (e.g. reduce explanations, embed aggressively into paragraphs on the main trails explanation, put a list of links one by one after the trail desc, to cards, etc, etc).

Anyways, to clarify, not a blocker on my end.

@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 11, 2023 14:23 Inactive
@omesser omesser requested a review from dberenbaum April 11, 2023 14:25
@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 11, 2023 14:36 Inactive
from the list above, left-side navigation, or just click `NEXT` below!

</details>
Pick a page from the list above, the left-side navigation bar, or just click
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's needed here tbh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was here before in some other form as you can see. I thought about removing it entirely, but I guess having some navigation-helper text in SOME place is helpful, so we have it in 1 place, and what other place is more suitable then in the first page a new user sees.
Maybe it's ok to keep since, like I mentioned in another comment, keep in mind users might not notice the next/prev buttons 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's related to the discussion on this page content above. I think the page is way toooooo long. Also We already have a lot of navigation help there. I think two links with some descriptions to the top pages of trails should be enough to serve the "navigational" purpose.

Copy link
Contributor Author

@omesser omesser Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a different thread then, let's try to separate from this single sentence please.

Pick a page from the list above, the left-side navigation bar, or just click

This content was here in this form before:
Screenshot 2023-04-12 at 23 42 42
I've:

  • split it - chapter navigation help should come BEFORE the chapter links
  • got rid of the expanding detail example, it's self explanatory imo
  • reworded a bit its contents

Do you still want this completely removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a different #4460 (comment) then, let's try to separate from this single sentence please.

they are related to my mind, since this sentence serves a navigational purpose and the whole content on the page does the same I think.

This content was here in this form before:

Yep, it was more a less educational stub about the expandable section. So, it's a completely different meaning to me now. And in the new (very different form and for a different reason) it feels excessive to me.

Do you still want this completely removed?

your call, I shared my opinion on this (and on the whole page), it's not a blocker :)

split it - chapter navigation help should come BEFORE the chapter links

It depends on the page length. If everything fits into one screen ( no scroll) - it doesn't matter much where it stays. I think initially the page was way smaller (before we started introducing trails).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's ok with you then, let's keep it for now.
I will do a followup PR as I mentioned to try and condense more content and maybe unify pages in the trails and this will effect this as well
As per fitting in one screen or not - I think order has to be correct regardless personally. people have different screen sizes, orientation, may read this on mobile / tablet and resize, so we shouldn't count on it too much

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

condense more content and maybe unify pages in the trails and this will effect this as well

My 2cs.

I don't feel it's a top priority tbh. If we decide to do this I would think about unifying and making it a single trails again (which has certain challenges).

Things like - let's remove a bit of admons / expandables and or move content into one page w/o deeply changing it - my personal take - they won't move the needle.

We went from a single page (Quick Start) -> to 3-4 pages (no experiments, no metrics yet, no import/get) -> 7-8 pages with import/get, metrics -> to 3-4 that combine all that stuff -> to now trails since we want to make experiments a bit more of first-class citizen. In my experience - I didn't see much difference so far in all of those.

It well might be that now we have more ppl on the website, etc (environment is different).

Anyways, just saying - 2 trails with multiple pages most likely is the same a single trail and most likely is more or less the same as 2 tails with a single page is. The meaningful change is to make it like MLEM, but in case of DVC I don't think it's possible tbh. They are very different products, with different learning curve, etc, etc.

@@ -1,14 +1,15 @@
---
title: 'Get Started: Building Pipelines'
title: 'Get Started: Experiment Pipelines'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit of strange title to me - they are just pipelines, I don't this we use or split pipelines by their type. In all scenarios pipelines are about processing data + running training + eval, etc, etc. I'm not sure I understand what is so unique about these specific pipelines ... may be we need to rephrase this ... make it more practical - Making exps reproducible?

Copy link
Contributor Author

@omesser omesser Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're hitting a much larger issue here (design? product?). one that get-started doc cannot fully resolve, and def not in this PR :) why do we have dvc exp run and not dvc pipeline run ? 😉
Using the experiment commands and terminology as synonyms to pipelines is indeed super confusing... I think this is another reason why @daavoo and @dacbd are in favour of consolidating the pipeline contents to its own section (common to data-pipelines as well as exp-pipelines experience). This would make sense in this respect, but would create another discrepancy where the pipeline content won't be associated properly with experiment section... and then experiment section will be "just" dvclive experimentation
It's a deeper issue indeed.

Making exps reproducible

Just my opinion here, but I don't think this is a good title for get started. it's too "advanced", focused on a feature of a thing and not the thing. The deeper issue here is that we just have ambiguity about what is an experiment, and what is a pipeline.

Copy link
Contributor Author

@omesser omesser Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To trace back my mental steps, I initially called this: experiments as pipelines
But it didn't look as clear to me as a title / section and page name, so I condensed it a bit 😄
@shcheklein - is this a blocker to you? we can spitball some more title ideas here, but I think this is short, punchy, and actually explained in the beginning of the chapter itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, let's do just "pipelines" for now? My point is - let's not invent and reinvent wheels please (like live tracking, etc). It's important to keep it simple. Any adjective immediately raises a lot of questions. It's better to be very explicit and "dumb" vs smart and eloquent, etc. I hope that makes sense.

Again, in this case I would personally go with some actual task that we are solving. Even pipelines are too abstract to my mind for this section.

Copy link
Contributor Author

@omesser omesser Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, let's do just "pipelines" for now?

The point of this PR is to get some order and logic into those chapters so they make sense in the context of the trails and be meaningful. This chapter deals with using pipelines for experiments and using dvc exp run on them. It has to be distinct from the data pipelines chapter under data-management and associated with experiments somehow to be meaningful and clear.
This is not inventing or re-inventing anything here, just removing vagueness and confusion that hit you when you're reading this page today.

I will go with "Experimenting using pipelines" in this case, and will have to rename other chapters to make them a bit more consistent, it will make them less compact.
I don't think it's better but that's the only other way to reconcile our differences here I think. I truely think that naming this "pipelines" with the current structure is counter productive.

It's important to keep it simple. Any adjective immediately raises a lot of questions.

I am a firm believer that clarity is more important than brevity 😄

Again, in this case I would personally go with some actual task that we are solving. Even pipelines are too abstract to my mind for this section.

I'm sorry I'm not getting this part I guess. This is not a real life tutorial but get-started, but I think "experimenting using pipelines" is maybe close to what you meant here. If I understand correctly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a firm believer that clarity is more important than brevity 😄

yes 100%, imo, and in this case Building Pipelines is more clear vs "Experiments Pipelines". That's more or less the essence of my comment.

This is not inventing or re-inventing anything here, just removing vagueness and confusion that hit you when you're reading this page today.

if you are trying to optimize a case when person lands to this page and you want them to see "Experiments" (it's already in URL, it should hopefully clear from the text that this belongs to a trail, but anyways), I would make the title (not the sidebar entry), something like "Experiments: Building Pipelines" (just an example).

My point in all this discussion (similar to the live ..) is that any adjective or modifier double the mental effort needed to understand this. E.g. "Simple Pipelines" (are there complex? what's the difference, etc). In this case I read "Experiments" as modifier (not a as context) and it raises questions in my head.

It has to be distinct from the data pipelines chapter under data-management and associated with experiments somehow to be meaningful and clear.

It doesn't have to I think, btw. They both could have the same title - and I would be personally fine with that. I think we are bit overcomplicating this. Also, clearly pipelines is a topic on which we don't have a clear decision / discussion yet.

Copy link
Contributor Author

@omesser omesser Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, we may not be in agreement about the role of modifiers and adjectives though.
To try and move this forward I pushed the change so it is now "Experimenting using pipelines".
If/when we condense the content further so "experiments" will be a single page, or multi-page but with a specific entry points, and then "pipelines" or "building pipelines" will just be a part of it in a clear context, I would agree with your logic here completely, but currently, with chapters being self-sufficient I want the differentiation to be very clear. Keep in mind that this is the landing page for get-started, everything has to be clear and make sense. if there's a "wtf moment" (where user leans back, raises an eyebrow because things are confusing or unclear 😆 ) - we're losing them I think

clearly pipelines is a topic on which we don't have a clear decision / discussion yet.

Many discussions from what I've heard ;) not a decision

PTAL at the changes now, tell me if we're good to go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ready when you feel it's ready :)

but currently, with chapters being self-sufficient I want the differentiation to be very clear

yep, as a reminder - there are ways to this by making the title (what you on see on the page, not sidebar) very explicit - you have way more space there and it's good for SEO.

@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 12, 2023 20:25 Inactive
@omesser omesser dismissed dberenbaum’s stale review April 12, 2023 20:39

re-review please?

@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 12, 2023 20:40 Inactive
@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 12, 2023 21:55 Inactive
@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 12, 2023 22:50 Inactive
@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 12, 2023 23:11 Inactive
@omesser omesser temporarily deployed to dvc-org-get-started-top-nr5wyh April 13, 2023 12:40 Inactive
@omesser
Copy link
Contributor Author

omesser commented Apr 13, 2023

@dberenbaum @daavoo @shcheklein - would love to get an approval to unblock this, or review if you still see something that bothers you in this scope or that regressed from current upstream that I might have missed

Thanks!

@omesser omesser merged commit c9762c2 into main Apr 13, 2023
@omesser omesser deleted the get_started_top_funnel branch April 13, 2023 16:20
@dberenbaum
Copy link
Collaborator

Thanks @omesser! Nice improvements!

We went from a single page (Quick Start) -> to 3-4 pages (no experiments, no metrics yet, no import/get) -> 7-8 pages with import/get, metrics -> to 3-4 that combine all that stuff -> to now trails since we want to make experiments a bit more of first-class citizen. In my experience - I didn't see much difference so far in all of those.

👍 I think these changes were warranted as relatively quick fixes to confusion that was identified by both internal feedback and from conversations from users after the recent changes, but agreed we don't need to spend much more time here, since we have spent a lot of iterations already and haven't seen much evidence it has major impact.

Also, clearly pipelines is a topic on which we don't have a clear decision / discussion yet.

Yeah, this clearly surfaced some larger product questions around pipelines that go beyond docs/get started IMO. I'll follow up on this but maybe let's wait a few days for everyone to have some time to reflect on it.

@shcheklein
Copy link
Member

👍 I think these changes were warranted as relatively quick fixes to confusion that was identified by both internal feedback and from conversations from users after the recent changes, but agreed we don't need to spend much more time here, since we have spent a lot of iterations already and haven't seen much evidence it has major impact.

yep, some of them were quick fixes, some of them (like trails, or migration from one page to a reproducible project) were pretty large and planned activities

@dberenbaum
Copy link
Collaborator

Sorry, I meant this PR was warranted because it was mostly about relatively quick fixes. I didn't mean to suggest all the get started work we have done was quick fixes 😄 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants