Semi-autogenerated docs #171

mike0sv · 2022-09-03T08:40:28Z

There are a number inconsistencies between mlem docs and actual mlem code. Sometimes it's because of new features that we forgot to add docs for, sometimes it's fixes in docs that are not reflected in mlem code. To make everything as consistent as possible, I suggest to auto-generate everything we can.
Of course, a big chunk of docs will remain hand-crafted. I am talking about parts of reference pages for API, CLI and upcoming Objects.

Ideal process:

for specific part of docs (cli/api/etc) a specification is generated from mlem codebase in a form of json file. It contains all docs-related stuff from code (docstrings, help messages etc). It's generated from latest mlem version in CI
in .md files a special "generate" expression is used (kind like this)
in CI (or maybe even in realtime) those expressions are substituted for actual docs generated from spec

For now:

same as above, but manually (not CI)
.md's stays the same
special script finds parts of .md files that should be autogenerated and replaces their contents with generated from spec
script runs locally, final .md committed

I will start with CLI for iterative/mlem#363 and create a PR shortly with examples

The text was updated successfully, but these errors were encountered:

shcheklein · 2022-09-04T05:52:57Z

It has been discussed a few times iterative/dvc.org#2770 before.

My take - this approach creates pretty bad docs (if you could run mlem something --help and get the same result, you don't need docs at all) or requires significant maintenance (proper docstrings that are hard actually to make so that they satisfy all the requirements that we have for docs - e.g. admons, code blocks, etc, etc).

shcheklein · 2022-09-04T05:54:14Z

Even for python API (that makes more sense to me to generate), afair team decided to keep very simple in the code and longer descriptions / examples in docs.

mike0sv · 2022-09-05T05:50:03Z

Yes, the idea is to force the simple part to be the same in docs and in code. Longer descriptions and examples will be only in docs with all those fancy md things.
We already have tests in code for all classes, options and fields to have docstrings.
Just doing PR showed that we 1) had couple of commands left out of the docs because we forgot to add them 2) had a couple of cli options in docs that we deleted from code 3) docs team fixed a lot of wording/spelling/punctuation in docs that was not backported to code (I did the backporting manually in iterative/mlem#363)
And you can see that beside those discrepancies, PR didnt actually change anything else like formatting. So it should be best of both worlds - handcrafted docs with automation that checks if they are up to date

mike0sv · 2022-09-05T05:57:09Z

From iterative/dvc.org#2770 (comment) : my intention is exactly what @casperdcl wrote in the end: generate subset of (2) from (1)

shcheklein · 2022-09-05T06:00:25Z

had couple of commands left out of the docs because we forgot to add them

had a couple of cli options in docs that we deleted from code

This can be solved by introducing a check. No need to generate or keep source code as a source for docs. I'm not sure how valuable everything else. Tbh from my experience it's still quite rare that we would even benefit from a checks like those.

docs team fixed a lot of wording/spelling/punctuation in docs that was not backported to code (I did the backporting manually in iterative/mlem#363)

this is minor. Usually major work is done by writing proper description of those options. Point here is - if you they are the same as --help and auto generated, you don't need them at all. In DVC they are far from being the same.

generate subset of (2) from (1)

I'm not sure it's possible tbh, unless I'm missing something. Usually 2 looks quite different from 1.

casperdcl · 2022-09-05T12:52:07Z

I think it could be helpful to have a basic CI check that e.g. cml <command> --help lists the same options as show up in the bullet points in https://cml.dev/doc/ref/<command>#options for example...

mike0sv · 2022-09-05T19:30:45Z

The check you are talking about is almost the same as what I propose. To implement this check there are 2 ways: parse existing options section, find what options are there and compare with what --help have (extracting them from typer (click) api is even easier), or generate this section from code and compare with existing text. Second approach allow to use same code avoid re-writing all of this manually. If you are not happy with what was generated, you can always fix text in docstings or formatting in generator code

mike0sv · 2022-09-05T19:32:22Z

I'm not sure it's possible tbh, unless I'm missing something.

Mmm probably you are. I'm not talking about something like mlem cmd --help > cli-reference/cmd.md
Please take a look at #172

shcheklein · 2022-09-06T04:40:00Z

@mike0sv how do you envision the workflow for this though? it should be a check that you run regularly anyway and then either you generate boilerplate automatically as a PR or fix it manually. If you don't automate this then who is responsible running this.

Anyways, my point is that from my experience this takes time to automate, takes time to maintain, etc, etc and in case of DVC was not solving much. Most of the work goes into writing meaningful option descriptions (neither --help nor docstrings give them).

Mmm probably you are. I'm not talking about something like mlem cmd --help > cli-reference/cmd.md

I understand that it doesn't generate the whole md file, it generates some parts of it, right? (not sure if it keeps or not options that already exist). And that's exactly what I was talking about- It drives bad docs to my mind.

For every PR in code that changes API / CLI we should be creating a proper PR with docs update. It should have examples, proper description (--help doesn't give it). This process guarantees that we have meaningful docs. Automation can help to check for discrepancies (e.g. run by cron) or bootstrap it the first time (similar to #172).

omesser · 2022-09-14T17:46:44Z

@shcheklein I understand the concern about docstrings contents restricting the doc site content 🙏 we discussed this offline as well. But it doesn't have to be this way imo. So it's very possible to achieve some automation here without any "new" workflow that would reduce quality.
The current alternative is that things become obsolete or are just plain dropped and forgotten, so I think this is undoubtably worse 😄

And that's exactly what I was talking about- It drives bad docs to my mind.

I think it doesn't have to. The generated docs are definitely better than nothing, even if they are just a skeleton for more examples / fleshed out content which requires time and attention. So it's not against that, but automating the repetitive content at least

This is the way I see it at least. So I do suggest we give this a try, dvc docs are more stable and 99% goes to handcrafted content, but mlem is in a different stage and things are more dynamic, this can potentially help guard us from drift between docsite and tool.

For this to be effective I also think we want to automate this somehow - run in a cronjob and generate a suggestion PR every week or so. would be a good reminder and even if not mergable, and we need a man-in-the-loop, it can provide the skeleton for the changes

shcheklein · 2022-09-14T22:30:50Z

TL;DR: I'm fine to automate and try (but keep in mind we are spending time on this :) ).

The generated docs are definitely better than nothing, even if they are just a skeleton for more examples / fleshed out content which requires time and attention. So it's not against that, but automating the repetitive content at least

yes. But this is about bootstrapping pretty much? After the project is more or less stable I found it's hard to justify this level of automation (I mean making more and more sophisticated scripts to merge / embed, etc, etc). Everything can be done, but it has its own cost. While 99% time in docs goes into writing content. Creating manually a PR that just copy-pastes things when you change a command is not painful at all unless you change something every day (I doubt that it will be happening).

A bit of reflection on my approach / my thoughts.

Personal perception. I'm quite annoyed when I come to docs and only thing I see is a copy-paste of some existing content (I got in an IDE already, or I got it in CLI already). My feeling is exactly like that "folks automated and forgot about this since it's good enough". My feeling usually is that creators don't try to make my life easier.
It creates a false feeling of completeness / existence of docs, and there will be less incentive to allocating time on improving it. I hope an alternative can be really light weight (e.g. you do one option per week, one small document per week, etc) and it can get us very far.
Writing is an essential and super important skill for every engineer.

casperdcl · 2022-09-15T00:39:20Z

I agree the automation might make more sense only for unique tools like CML where most pple don't download it to run --help locally. But even CML's online command ref goes a bit further than the CLI output... it has better markdown formatting, hyperlinks & URLs to more info, etc.

I'd only automate checking that all subcommands and --options exist in the command ref, but not checking the descriptions/wording.

ryanjdillon · 2023-07-03T08:37:43Z

As a potential new user, I find the Python API docs on mlem.ai difficult to work with, as they are not up to date, and could benefit from further typehinting.

For example:
The docs in the code have corrected typos, which make them more intelligible, and only by looking there could I find that fs is defined by fsspec and see what filesystems are supported.

mlem.api.save on mlem.ai
mlem.api.save in code

While I understand this requires some additional dev work, it may be worth the prioritization. In my case, I am evaluating using mlem/dvc/gto for a model registry, after which I'd like to evaluate Interactive Studio, but I need to get through the docs first ;)

mike0sv added the type: enhancement Something is not clear, small updates, improvement suggestions label Sep 3, 2022

mike0sv mentioned this issue Sep 3, 2022

Semi-autogenerated cli reference #172

Merged

jorgeorpinel added the A: docs Area: user documentation (gatsby-theme-iterative) label Sep 6, 2022

omesser changed the title ~~Semi-autogerated docs~~ Semi-autogenerated docs Sep 13, 2022

mike0sv mentioned this issue Sep 16, 2022

Docs for extensions #179

Closed

3 tasks

jorgeorpinel mentioned this issue Sep 22, 2022

Document dvcfs API iterative/dvc.org#3927

Closed

jorgeorpinel mentioned this issue Oct 10, 2022

docs improvements iterative/cml.dev#250

Open

28 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semi-autogenerated docs #171

Semi-autogenerated docs #171

mike0sv commented Sep 3, 2022

shcheklein commented Sep 4, 2022

shcheklein commented Sep 4, 2022

mike0sv commented Sep 5, 2022 •

edited

Loading

mike0sv commented Sep 5, 2022

shcheklein commented Sep 5, 2022

casperdcl commented Sep 5, 2022 •

edited

Loading

mike0sv commented Sep 5, 2022

mike0sv commented Sep 5, 2022 •

edited

Loading

shcheklein commented Sep 6, 2022

omesser commented Sep 14, 2022 •

edited

Loading

shcheklein commented Sep 14, 2022

casperdcl commented Sep 15, 2022 •

edited

Loading

ryanjdillon commented Jul 3, 2023 •

edited

Loading

Semi-autogenerated docs #171

Semi-autogenerated docs #171

Comments

mike0sv commented Sep 3, 2022

shcheklein commented Sep 4, 2022

shcheklein commented Sep 4, 2022

mike0sv commented Sep 5, 2022 • edited Loading

mike0sv commented Sep 5, 2022

shcheklein commented Sep 5, 2022

casperdcl commented Sep 5, 2022 • edited Loading

mike0sv commented Sep 5, 2022

mike0sv commented Sep 5, 2022 • edited Loading

shcheklein commented Sep 6, 2022

omesser commented Sep 14, 2022 • edited Loading

shcheklein commented Sep 14, 2022

casperdcl commented Sep 15, 2022 • edited Loading

ryanjdillon commented Jul 3, 2023 • edited Loading

mike0sv commented Sep 5, 2022 •

edited

Loading

casperdcl commented Sep 5, 2022 •

edited

Loading

mike0sv commented Sep 5, 2022 •

edited

Loading

omesser commented Sep 14, 2022 •

edited

Loading

casperdcl commented Sep 15, 2022 •

edited

Loading

ryanjdillon commented Jul 3, 2023 •

edited

Loading