Skip to content
This repository has been archived by the owner on Aug 10, 2024. It is now read-only.

GTO docs #199

Merged
merged 42 commits into from
Nov 23, 2022
Merged

GTO docs #199

merged 42 commits into from
Nov 23, 2022

Conversation

aguschin
Copy link
Contributor

@aguschin aguschin commented Oct 24, 2022

@aguschin aguschin self-assigned this Oct 24, 2022
@shcheklein shcheklein temporarily deployed to mlem-ai-gto-docs-pzdfnkadkdwtv October 24, 2022 10:32 Inactive
@shcheklein shcheklein temporarily deployed to mlem-ai-gto-docs-pzdfnkadkdwtv October 24, 2022 10:37 Inactive
@github-actions
Copy link

github-actions bot commented Oct 24, 2022

803e7cf

Link Check Report

2/46 links failed.

CML watermark

@shcheklein shcheklein temporarily deployed to mlem-ai-gto-docs-pzdfnkadkdwtv October 24, 2022 10:47 Inactive
@francesco086
Copy link
Contributor

Perhaps my first question would be: why writing the gto docs as part of the mlem documentation?
From what I can tell, it's because it is necessary to build a model registry. Same goes for dvc, but dvc has its own documentation page.

Where am I going with this question? I guess the "why" you are writing this, is to enable users to build a full-featured model registry. This comes from the cross-use of 3 tools: mlem, dvc, gto.

If I am right, and this is the goal, then I would suggest to avoid trying to write a general introduction to gto here, that should be in its own documentation. Rather, write directly "how do I build a full-featured model registry using dvc, mlem, and gto?".

@shcheklein shcheklein temporarily deployed to mlem-ai-gto-docs-pzdfnkadkdwtv October 24, 2022 12:50 Inactive
@shcheklein shcheklein temporarily deployed to mlem-ai-gto-docs-pzdfnkadkdwtv October 24, 2022 13:01 Inactive
@shcheklein shcheklein temporarily deployed to mlem-ai-gto-docs-pzdfnkadkdwtv October 24, 2022 13:07 Inactive
@@ -0,0 +1,10 @@
# Using GTO Commands

GTO is a command line tool. Here, we provide the specifications, complete
Copy link
Contributor Author

@aguschin aguschin Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @francesco086. This is good point of view. I'm opening a thread based on your comment to keep a discussion re it in a single place.

Minor: If you check out DVC docs, you'll see there are docs for Studio and DVClive. We can put this "GTO documentation" there or here (I used mlem.ai cause it was easier for me). Or to a separate website, like iterative.ai/doc maybe? Not sure. We need some place to keep GTO docs anyway.

Major: explaining how to build a registry with DVC+GTO+MLEM. Good question where to put that. In this PR you can see I was going to put answers in /doc/gto/user-guide. I guess the Tutorial format would be the best for this, and we could add it to each product involved under Use Cases (e.g. here it can be next or instead of "Pure MLEM Model registry"):
image

The other option is to create a GS with this - but that would be way to heavy for Get Started. I guess Tutorial or blog post serves the purpose better.

Another place to have this is Model Registry page in Studio docs. But, not sure yet how UI (Studio) and CLI (GTO+DVC+MLEM Tutorial) could co-exist here. Maybe cross-links are a better approach than having this in Studio docs.

Again, good topic to think about 🤔 We also leave CML out of the picture above, it also can be a part of a MR...

@tapadipti, have you had any discussion about setting up a DVC+GTO+MLEM Tutorial to complement Studio docs? Looks like it much needed, but I can't see we ever created something like that.

Copy link
Contributor

@jorgeorpinel jorgeorpinel Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good place to put GTO docs if we want docs beyond a CMD/API ref (otherwise we could do with a README and possibly a site like https://docs.iterative.ai/dvc-task/reference/dvc_task/

Major: explaining how to build a registry with DVC+GTO+MLEM. Good question where to put that...
Tutorial format would be the best for this

We mention it very high-level in https://mlem.ai/doc/use-cases/model-registry now. And there's the https://iterative.ai/model-registry solution page separately. I'm not sure how much we want to go into the details of this 3-way integration. May be a good blog topic indeed. Let's create a separate issue to discuss that, though?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://iterative.ai/model-registry should have links to all relevant docs pages. But since the docs can't reside there, Studio docs look like the next best place to me for explaining how to build a registry with DVC+GTO+MLEM. We could create a Use cases section. But depending on how much and what content we need, a blog post may also suffice. And docs specific to the GTO cli should definitely be separate.

If you check out DVC docs, you'll see there are docs for Studio and DVClive.

This is to be changed. We will host Studio docs separately in its own docs site (like CML) - although we don't have dates for this yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I'm trying to draft that blog post - please see https://www.notion.so/iterative/Tutorial-Model-Registry-in-Git-with-DVC-MLEM-and-GTO-af124368ce9f4523a568a7e1875c7af3 - high-level feedback would be appreciated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aguschin. I've left some comments in the draft blog post.

@jorgeorpinel jorgeorpinel added the A: docs Area: user documentation (gatsby-theme-iterative) label Oct 26, 2022
@jorgeorpinel
Copy link
Contributor

close iterative/gto#293

Prob still need to update the README as well? 🙂 (no rush)

yarn.lock Outdated Show resolved Hide resolved
@jorgeorpinel jorgeorpinel changed the title GTO docs [WIP] GTO docs Oct 26, 2022
@jorgeorpinel jorgeorpinel marked this pull request as draft October 26, 2022 20:36
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you check out DVC docs, you'll see there are docs for Studio and DVClive

I think we should follow something similar to https://dvc.org/doc/dvclive here:

  • Short docs home page (links to installation in README, for example);
  • Get Started (single page);
  • Technical reference (commands in the case of GTO)

Everything else may be overkill here. Please let's avoid the situation we have in MLEM in general with too many docs we can't properly finish 🙂

Comment on lines 9 to 10
by creating Git tags of [special format](/doc/gto/user-guide) and managing
[`artifacts.yaml` metafile](/doc/gto/user-guide). Since committing large files
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a big User Guide, for now we can have a single guide page explaining these formats and their mechanics (again, similar to DVCLive's Folder Structure doc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the all your feedback. We were releasing MLEM 0.3.0, so I was a bit off this. In general:

  1. I processed your feedback - thanks! - feel free to bring more
  2. Let's focus on GS, but I'll work on other things while I wait for you
  3. Let's see if UG can fit a single page. Don't want to complicate things and write extra things, but it may be required to split in subpages.

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get Started review 👇🏼 Let's focus on this first? In fact, splitting the PR would be ideal IMO.

Comment on lines 28 to 38
This repo represents a simple example of Machine Learning Model Registry. Let's
review it:

```cli
$ gto show
╒══════════╤══════════╤════════╤═════════╤════════════╕
│ name │ latest │ #dev │ #prod │ #staging │
╞══════════╪══════════╪════════╪═════════╪════════════╡
│ churn │ v3.1.1 │ v3.1.1 │ v3.0.0 │ v3.1.0 │
│ segment │ v0.4.1 │ v0.4.1 │ - │ - │
│ cv-class │ v0.1.13 │ - │ - │ - │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 👍🏼 👍🏼

I kind of like that we start by showing the end-result! It's a good way to deliver the value proposition quickly in here (main purpose of this doc).

content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
@shcheklein shcheklein temporarily deployed to mlem-ai-gto-docs-pzdfnkadkdwtv November 1, 2022 15:05 Inactive
@shcheklein shcheklein temporarily deployed to mlem-ai-gto-docs-pzdfnkadkdwtv November 22, 2022 06:39 Inactive
@shcheklein shcheklein temporarily deployed to mlem-ai-gto-docs-pzdfnkadkdwtv November 22, 2022 23:02 Inactive
Comment on lines +1 to +5
# Why GTO?

**GTO** is a tool for creating an Artifact Registry in your Git repository. One
of the special cases we would like to highlight is creating a
[Machine Learning Model Registry](/doc/use-cases/model-registry).
Copy link
Contributor

@jorgeorpinel jorgeorpinel Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole page is also not in sidebar.json (minor?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still secretly published in https://mlem.ai/doc/gto/why-gto.

Comment on lines 3 to 5
To create an Artifact Registry with GTO, you only need a Git repo and GTO
package installed. There's no need to set up any services or databases, compared
to many other Model Registry offerings.
Copy link
Contributor

@jorgeorpinel jorgeorpinel Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To create an Artifact Registry with GTO, you only need a Git repo and GTO
package installed. There's no need to set up any services or databases, compared
to many other Model Registry offerings.
You'll need [Python](https://www.python.org/) to install GTO, and
[Git](https://git-scm.com/) to use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear now why you need DB/Services at all - if we talk about GTO installation, let's remove all mentions of MR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the part about DBs because it didn't seem too relevant to mention in the installation page, but it may make sense in other docs.

Not sure I understood your suggestion wrt MR mentions.

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions on whether we want to expose Git technicalities or not (apply to all docs, mainly cmd ref, but I'm only commenting on a couple pages).

UPDATE: Please ignore this for now...

  • We can address later. There are lower hanging fruit here.

content/docs/gto/command-reference/index.md Outdated Show resolved Hide resolved
Comment on lines +3 to +4
Create an artifact version to signify an important, published or released
iteration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should be more specific on what it does, like:

Suggested change
Create an artifact version to signify an important, published or released
iteration.
Create a Git tag containing the artifact's name and version.

Not sure, e.g. in DVC sometimes we keep it general (https://dvc.org/doc/command-reference/commit) and sometimes specific (https://dvc.org/doc/command-reference/checkout).

I guess it depends on whether we expect users to be familiar enough with Git and/or whether we want them to keep in mind the mechanics. But if we consider these implementation details, then let's keep it general but also remove/ hide Git tag details in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the pages are auto-generated, this requires changing the code. Will do that later.

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're mainly concerned with CI/CD let's be specific instead of saying just "downstream"? More concrete

content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/user-guide.md Outdated Show resolved Hide resolved
content/docs/gto/user-guide.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should dissolve THE relatively long intro (reusing some of the text in each section). That way we get to something actionable faster:

content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last batch of suggestions on Get Started:

content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
content/docs/gto/get-started.md Outdated Show resolved Hide resolved
@shcheklein shcheklein temporarily deployed to mlem-ai-gto-docs-pzdfnkadkdwtv November 23, 2022 07:10 Inactive
@shcheklein shcheklein temporarily deployed to mlem-ai-gto-docs-pzdfnkadkdwtv November 23, 2022 07:35 Inactive
@aguschin aguschin dismissed jorgeorpinel’s stale review November 23, 2022 07:44

merging the current docs to improve them later

@aguschin aguschin merged commit 97ada32 into main Nov 23, 2022
@aguschin aguschin deleted the gto-docs branch November 23, 2022 08:28
@jorgeorpinel jorgeorpinel added the ✨ epic Placeholder ticket for multi-sprint direction, use story, improvement label Dec 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A: docs Area: user documentation (gatsby-theme-iterative) ✨ epic Placeholder ticket for multi-sprint direction, use story, improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create GTO docs
8 participants