Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Support code-snippet extraction and docs-updating #3948

Merged
merged 1 commit into from
Feb 16, 2024

Conversation

chalin
Copy link
Contributor

@chalin chalin commented Feb 6, 2024

What's this PR about?

  • Contributes to Add tooling in support of auto-update of code excerpts in docs #1635 by setting up the tooling and providing a simple example of use.
  • This PR illustrates the use of code-excerpt extraction from source code, and the syncing of code excerpts in doc pages.
  • I've used Go's Getting started code as an example, annotating three code blocks for auto-syncing.
    Notes: I only marked three code blocks for syncing. Of these:
    • The tooling found a single line mismatch (a deleted comment)
    • No changes were required of the Go source code for these particular code blocks to sync.
  • I've illustrated the most basic code synching functionality. If maintainers agree to move forward with this approach, I'll progressively illustrate the capabilities of the tooling as I set up the syncing of more code blocks.

How it works

  • Just add <?code-excerpt ...?> directives before code blocks to have the code snippet synced. That's it, the tools do the rest once configured.
  • Code excerpts get extracted from the Go code repo, which is included as a submodule.

How can I try this myself? (For maintainers/approvers only)

  • As indicated in tools/README.md: install the Dart SDK if you don't have it already.
  • Run: npm run code-excerpts
    This does some preprocessing of the code excerpts and then syncs the docs -- both substeps are independent. Once primed, that full prep and sync takes less than 6 seconds on my machine, and that's mostly tooling overhead. The tools scale very nicely in the number of code excerpts and pages processed.

PR details

  • Sets up tooling in support of code-excerpt extraction from sources, and syncing of code excerpts in doc pages.
  • Adds two new directories. For details, see their respective README files.
    • examples
    • tools
  • Adds NPM scripts: code-excerpts, code-excerpts:get, and code-excerpts:update-docs
  • Adds opentelemetry-go as a submodule source to have access to example/dice code

⚠️ Important: Consider this feature in beta, and for repo maintainers/approvers only.

For later

  • This PR do not hook up code-excerpt update into CI yet.
  • Prettier doesn't play nice with the SGML directive, so I've temporarily added Prettier ignore directives. I plan on eliminating that requirement soon.
  • Symlinks are used to centralize directories used by the code-excerpt tooling. (That won't be a problem for Linux and macOS environments. It may not play nice under Windows. I could get around this issue, but would rather not have to invest time in that atm.)

@chalin chalin added the CI/infra CI & infrastructure label Feb 6, 2024
@chalin chalin requested review from a team February 6, 2024 11:41
@chalin chalin force-pushed the chalin-code-excerpt-2024-02-04 branch 2 times, most recently from 1a1025a to fee3614 Compare February 6, 2024 11:48
@pellared
Copy link
Member

pellared commented Feb 6, 2024

@chalin FYI related: #3949

@cartermp
Copy link
Contributor

cartermp commented Feb 6, 2024

The issue with this approach is we're going to now have 11+ additional sources of contributors messing up submodules. In practice I think most contributions are in the text and not code snippets, but the introduction of that many submodules means we're probably going to have commit to cleaning up peoples' branches for them. Submodules are difficult to work with and easy to screw up.

@chalin
Copy link
Contributor Author

chalin commented Feb 6, 2024

... Submodules are difficult to work with and easy to screw up.

Contributors can now simply run npm run sync to sync the submodules in their PR branches, and commit the changes if any. For details, see the thread starting at #3149 (comment).

As @svrnm pointed out during the last Comms meeting, the issue isn't how many submodules we have, it's whether we have submodules at all. Since we're pretty much stuck with submodules, having a few more won't be an extra burden; and besides, IMHO npm run sync will go a long way towards alleviating the pains of submodule syncing.

As maintainers the question we need to ask is whether we want #1635 solved in the repo or not. If we do, then this PR offers one of the best solutions I know of.

/cc @theletterf

@chalin chalin force-pushed the chalin-code-excerpt-2024-02-04 branch from fee3614 to 0cafa2e Compare February 6, 2024 14:27
@svrnm
Copy link
Member

svrnm commented Feb 6, 2024

Big fan of that, thank you for tackling this @chalin!

... Submodules are difficult to work with and easy to screw up.

Contributors can now simply run npm run sync to sync the submodules in their PR branches. For details, see the thread starting at #3149 (comment).

As @svrnm pointed out during the last Comms meeting, the issue isn't how many submodules we have, it's whether we have submodules at all. Since we're pretty much stuck with submodules, having a few more won't be an extra burden; and besides, IMHO npm run sync will go a long way towards alleviating the pains of submodule syncing.

First: +1 for npm run sync fixing 80% of that, I used it the other day and it does a perfect job, maybe we can turn this into a /fix:sync as well?

Yeah, in my experience it's not one submodule that contributors screw up, it's "all or nothing". Although I agree that adding another 11+ submodules may come with additional challenges, e.g. how much is this slowing down the build process, etc. But this is an issue to worry about later, since we start with one repo now...

Also, I am wondering, how much do contributors need to interact with that? So, if I fix some typos or words in the docs, do I even need to install&use that tooling? Or, even better, could we package all of that in a github workflow that regularly fetches the code repos, executes that code and then raises the appropriate PR?

As maintainers the question we need to ask is whether we want #1635 solved in the repo or not. If we do, then this PR offers one of the best solutions I know of.

I want this to be solved: we are software developers creating software for software developers to make it easier for them to fix problems in their code. So there are rightfully some high expectations for our stuff (including the docs) to work. Also, with our docs still growing quicker than we onboard new docs contributors we need every tooling we can have to keep us afloat.

One organizational note:
One key element for this to be successful is that we setup a proper "contract" between us and the code-owning SIGs. We need some guarantees that this code will live at a certain place in their repository and that it does certain things consistently across languages (e.g. for the getting started the roll the dice app should always run on port 8080, have /rolldice as endpoint, ...). So while trying this out with Go, we might consider establishing some documentation & practices on that (I am wondering if it may help to have @ docs-approvers/maintainers as CODEOWNERS for those files)

@cartermp
Copy link
Contributor

cartermp commented Feb 6, 2024

For this:

the issue isn't how many submodules we have, it's whether we have submodules at all.

I'd say the more submodules, the more ways things can get out of date, so it's also a matter of how many. I'm happy that we have npm run sync, but I'm hoping we can help a little more with automations. Layering on this, the refcache, and the oodles of checks we now have there's a lot of ways for a "simple contribution" to turn into a whole ordeal of unhappiness.

@chalin
Copy link
Contributor Author

chalin commented Feb 6, 2024

@svrnm wrote:

turn this into a /fix:sync as well?

Great idea! Let's track that via:

Also, I am wondering, how much do contributors need to interact with that?

Initially, no interaction at all.

So, if I fix some typos or words in the docs, do I even need to install&use that tooling?

No.

Or, even better, could we package all of that in a github workflow that regularly fetches the code repos, executes that code and then raises the appropriate PR?

Yes.

One key element for this to be successful is that we setup a proper "contract" between us and the code-owning SIGs.

Right.

We're trying to solve an inherently challenging problem. I propose that we approach this incrementally. First, let's agree on the base tooling (even if it changes later) so that we can start with something concrete. Then find a SIG willing to have us sync their code. Then incrementally bring in the automation, adjusting as we go along.

My proposal is that contributors not have to interact with this at all in the beginning. We maintainers/approvers would manually run the code-excerpt script when needed -- e.g., to handle situations like #3949 -- and commit changes.

Eventually we'll need to educate contributors that code changes need to happen at the source. There are ways to mitigate this (that I can share at our next Comms meeting), but IMHO we should stay focus on first steps.

@cartermp wrote:

I'd say the more submodules, the more ways things can get out of date, so it's also a matter of how many.

You're right, of course: the number of submodules does have an incremental impact.

I'm happy that we have npm run sync, but I'm hoping we can help a little more with automations.

That's the plan.

Layering on this, the refcache, and the oodles of checks we now have there's a lot of ways for a "simple contribution" to turn into a whole ordeal of unhappiness.

Yup, and we've been trying to do our best to keep all stakeholders as happy as possible (#2448, #3149). There are inherent complexities in the problem we're trying to manage (supporting the build, checking and deployment of a non-trivial tech-docs website with parts spread across repos) beyond "just writing docs". E.g., submodules can be a pain to deal, but most of the time IMHO, they help cleanly solve the problem of keeping content across repos in sync.

@svrnm
Copy link
Member

svrnm commented Feb 7, 2024

<?code-excerpt "otel.go" from="package main"?>

For the sake of consistency, can we wrap this in a short code that is only there to replace those <? with {{%:

{{% code-excerpt "otel.go" from="package main" %}}

We're trying to solve an inherently challenging problem. I propose that we approach this incrementally. First, let's agree on the base tooling (even if it changes later) so that we can start with something concrete. Then find a SIG willing to have us sync their code. Then incrementally bring in the automation, adjusting as we go along.

My proposal is that contributors not have to interact with this at all in the beginning. We maintainers/approvers would manually run the code-excerpt script when needed -- e.g., to handle situations like #3949 -- and commit changes.

Eventually we'll need to educate contributors that code changes need to happen at the source. There are ways to mitigate this (that I can share at our next Comms meeting), but IMHO we should stay focus on first steps.

+1 for this approach, in the worst case(!) we can pull back and revert that tooling. We should gain some experience with it and see if it adds more value vs more issues and then make a judgement. Maybe we need to onboard 2 to 3 SIGs to make a final decision if we want to go all in or not. Let's assume we take 1-2 months to add one more SIG we might be able to go from "beta" to rolling this out in ~3-4 months. WDYT?

@cartermp wrote:

I'd say the more submodules, the more ways things can get out of date, so it's also a matter of how many.

You're right, of course: the number of submodules does have an incremental impact.

I'm happy that we have npm run sync, but I'm hoping we can help a little more with automations.

That's the plan.

Layering on this, the refcache, and the oodles of checks we now have there's a lot of ways for a "simple contribution" to turn into a whole ordeal of unhappiness.

Yup, and we've been trying to do our best to keep all stakeholders as happy as possible (#2448, #3149). There are inherent complexities in the problem we're trying to manage (supporting the build, checking and deployment of a non-trivial tech-docs website with parts spread across repos) beyond "just writing docs". E.g., submodules can be a pain to deal, but most of the time IMHO, they help cleanly solve the problem of keeping content across repos in sync.

If we wrap all that code-updating into a workflow or into something only maintainers touch, I wonder if we should pick some "conditional" submodules, i.e. they are not part of .gitmodules by default and a maintainer (or a bot) needs to run a script first to add and load them, do the script updates, run a script to unload them and send the PR. It adds some extra burden on us / the workflow but it removes the necessity for regular contributors to interact with even more submodules

@chalin
Copy link
Contributor Author

chalin commented Feb 9, 2024

@svrnm wrote:

For the sake of consistency, can we wrap this in a short code that is only there to replace those <? with {{%:

Eventually, but for now I'd like to focus on using the tools rather than customizing them. I have further improvements in mind, but again, I'd rather we gain experience with existing tooling for starters.

+1 for this approach, in the worst case(!) we can pull back and revert that tooling. We should gain some experience with it and see if it adds more value vs more issues and then make a judgement. Maybe we need to onboard 2 to 3 SIGs to make a final decision if we want to go all in or not. Let's assume we take 1-2 months to add one more SIG we might be able to go from "beta" to rolling this out in ~3-4 months. WDYT?

👍🏻

If we wrap all that code-updating into a workflow or into something only maintainers touch, I wonder if we should pick some "conditional" submodules, i.e. they are not part of .gitmodules by default and a maintainer (or a bot) needs to run a script first to add and load them, do the script updates, run a script to unload them and send the PR. It adds some extra burden on us / the workflow but it removes the necessity for regular contributors to interact with even more submodules

Sure, but with the magic and ease of npm run sync, I'd say that there's no extra burden on regular contributors. So I'd avoid the extra cost (for now) of conditional / secondary submodules.

WDYT?

@svrnm
Copy link
Member

svrnm commented Feb 9, 2024

@svrnm wrote:

For the sake of consistency, can we wrap this in a short code that is only there to replace those <? with {{%:

Eventually, but for now I'd like to focus on using the tools rather than customizing them. I have further improvements in mind, but again, I'd rather we gain experience with existing tooling for starters.

+1

If we wrap all that code-updating into a workflow or into something only maintainers touch, I wonder if we should pick some "conditional" submodules, i.e. they are not part of .gitmodules by default and a maintainer (or a bot) needs to run a script first to add and load them, do the script updates, run a script to unload them and send the PR. It adds some extra burden on us / the workflow but it removes the necessity for regular contributors to interact with even more submodules

Sure, but with the magic and ease of npm run sync, I'd say that there's no extra burden on regular contributors. So I'd avoid the extra cost (for now) of conditional / secondary submodules.

If we start with one repo (=> go), I have no concern with adding this one more sub-module for now. Let's make it work with go, and then keep it going for a few weeks and see how many things break.

@pellared , @cartermp: wdyt?

@cartermp
Copy link
Contributor

cartermp commented Feb 9, 2024

So one thing that this has caused me to think about is adding an action for a "please fix all my shit". So it fixes formatting, the refcache, runs npm sync, etc. Maybe if we not only have the actions in place, but also a single place to mostly fix most things, then that could be enough to ease my unease around the contributor experience.

@svrnm
Copy link
Member

svrnm commented Feb 12, 2024

So one thing that this has caused me to think about is adding an action for a "please fix all my shit". So it fixes formatting, the refcache, runs npm sync, etc. Maybe if we not only have the actions in place, but also a single place to mostly fix most things, then that could be enough to ease my unease around the contributor experience.

Like a fix all?

--> https://github.com/open-telemetry/opentelemetry.io/blob/main/.github/workflows/pr-actions.yml#L82

It does not have all the fixes yet, but updating that workflow file has become much easier now. So that should hopefully help a lot.

Another thing we could do is adding a workflow that checks CI results and let's people know that they can trigger those actions, or that they should not worry about them too much and that we can fix that for them before merging.

@hughesjj
Copy link

Another thing we could do is adding a workflow that checks CI results and let's people know that they can trigger those actions, or that they should not worry about them too much and that we can fix that for them before merging.

I prefer this approach given it's more "context-aware" and proactive to the contributors issue/flow, rather than "oh read the doc"

Copy link

@hughesjj hughesjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm other than one concern/question

@@ -43,6 +43,9 @@
"check:text": "npm run _check:text -- ",
"check": "npm run seq -- $(npm run -s _list:check:*)",
"clean": "make clean",
"code-excerpts": "rm -Rf tmp/excerpts/* && npm run seq -- code-excerpts:get code-excerpts:update-docs",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make sure: the other tmp references are ../tmp and this one is just tmp, is that expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is expected, but thanks for noticing: the other commands do a cd tools first, hence the leading ../.

Copy link
Member

@svrnm svrnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed yesterday during the SIG call we want to proceed with this.

@svrnm
Copy link
Member

svrnm commented Feb 13, 2024

@chalin update and merge this PR whenever you are ready

@chalin chalin force-pushed the chalin-code-excerpt-2024-02-04 branch from 2fe8b61 to 234cf67 Compare February 16, 2024 00:28
@chalin chalin merged commit b1712ad into open-telemetry:main Feb 16, 2024
14 checks passed
@chalin chalin deleted the chalin-code-excerpt-2024-02-04 branch February 16, 2024 00:30
@chalin chalin mentioned this pull request Feb 16, 2024
8 tasks
@svrnm
Copy link
Member

svrnm commented Feb 16, 2024

awesome, thanks @chalin! Now let's see how this works in the future.

@chalin
Copy link
Contributor Author

chalin commented Feb 16, 2024

@svrnm - I'm OOO next week, but will dive into using this again (probably for the demo) when I'm back (FYI)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/infra CI & infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants