[CI] Support code-snippet extraction and docs-updating #3948

chalin · 2024-02-06T11:41:53Z

What's this PR about?

Contributes to Add tooling in support of auto-update of code excerpts in docs #1635 by setting up the tooling and providing a simple example of use.
This PR illustrates the use of code-excerpt extraction from source code, and the syncing of code excerpts in doc pages.
I've used Go's Getting started code as an example, annotating three code blocks for auto-syncing.
Notes: I only marked three code blocks for syncing. Of these:
- The tooling found a single line mismatch (a deleted comment)
- No changes were required of the Go source code for these particular code blocks to sync.
I've illustrated the most basic code synching functionality. If maintainers agree to move forward with this approach, I'll progressively illustrate the capabilities of the tooling as I set up the syncing of more code blocks.

How it works

Just add <?code-excerpt ...?> directives before code blocks to have the code snippet synced. That's it, the tools do the rest once configured.
Code excerpts get extracted from the Go code repo, which is included as a submodule.

How can I try this myself? (For maintainers/approvers only)

As indicated in tools/README.md: install the Dart SDK if you don't have it already.
Run: npm run code-excerpts
This does some preprocessing of the code excerpts and then syncs the docs -- both substeps are independent. Once primed, that full prep and sync takes less than 6 seconds on my machine, and that's mostly tooling overhead. The tools scale very nicely in the number of code excerpts and pages processed.

PR details

Sets up tooling in support of code-excerpt extraction from sources, and syncing of code excerpts in doc pages.
Adds two new directories. For details, see their respective README files.
- examples
- tools
Adds NPM scripts: code-excerpts, code-excerpts:get, and code-excerpts:update-docs
Adds opentelemetry-go as a submodule source to have access to example/dice code

⚠️ Important: Consider this feature in beta, and for repo maintainers/approvers only.

For later

This PR do not hook up code-excerpt update into CI yet.
Prettier doesn't play nice with the SGML directive, so I've temporarily added Prettier ignore directives. I plan on eliminating that requirement soon.
Symlinks are used to centralize directories used by the code-excerpt tooling. (That won't be a problem for Linux and macOS environments. It may not play nice under Windows. I could get around this issue, but would rather not have to invest time in that atm.)

pellared · 2024-02-06T12:33:51Z

@chalin FYI related: #3949

cartermp · 2024-02-06T13:48:20Z

The issue with this approach is we're going to now have 11+ additional sources of contributors messing up submodules. In practice I think most contributions are in the text and not code snippets, but the introduction of that many submodules means we're probably going to have commit to cleaning up peoples' branches for them. Submodules are difficult to work with and easy to screw up.

chalin · 2024-02-06T14:26:59Z

... Submodules are difficult to work with and easy to screw up.

Contributors can now simply run npm run sync to sync the submodules in their PR branches, and commit the changes if any. For details, see the thread starting at #3149 (comment).

As @svrnm pointed out during the last Comms meeting, the issue isn't how many submodules we have, it's whether we have submodules at all. Since we're pretty much stuck with submodules, having a few more won't be an extra burden; and besides, IMHO npm run sync will go a long way towards alleviating the pains of submodule syncing.

As maintainers the question we need to ask is whether we want #1635 solved in the repo or not. If we do, then this PR offers one of the best solutions I know of.

/cc @theletterf

svrnm · 2024-02-06T14:48:37Z

Big fan of that, thank you for tackling this @chalin!

... Submodules are difficult to work with and easy to screw up.

Contributors can now simply run npm run sync to sync the submodules in their PR branches. For details, see the thread starting at #3149 (comment).

As @svrnm pointed out during the last Comms meeting, the issue isn't how many submodules we have, it's whether we have submodules at all. Since we're pretty much stuck with submodules, having a few more won't be an extra burden; and besides, IMHO npm run sync will go a long way towards alleviating the pains of submodule syncing.

First: +1 for npm run sync fixing 80% of that, I used it the other day and it does a perfect job, maybe we can turn this into a /fix:sync as well?

Yeah, in my experience it's not one submodule that contributors screw up, it's "all or nothing". Although I agree that adding another 11+ submodules may come with additional challenges, e.g. how much is this slowing down the build process, etc. But this is an issue to worry about later, since we start with one repo now...

Also, I am wondering, how much do contributors need to interact with that? So, if I fix some typos or words in the docs, do I even need to install&use that tooling? Or, even better, could we package all of that in a github workflow that regularly fetches the code repos, executes that code and then raises the appropriate PR?

As maintainers the question we need to ask is whether we want #1635 solved in the repo or not. If we do, then this PR offers one of the best solutions I know of.

I want this to be solved: we are software developers creating software for software developers to make it easier for them to fix problems in their code. So there are rightfully some high expectations for our stuff (including the docs) to work. Also, with our docs still growing quicker than we onboard new docs contributors we need every tooling we can have to keep us afloat.

One organizational note:
One key element for this to be successful is that we setup a proper "contract" between us and the code-owning SIGs. We need some guarantees that this code will live at a certain place in their repository and that it does certain things consistently across languages (e.g. for the getting started the roll the dice app should always run on port 8080, have /rolldice as endpoint, ...). So while trying this out with Go, we might consider establishing some documentation & practices on that (I am wondering if it may help to have @ docs-approvers/maintainers as CODEOWNERS for those files)

cartermp · 2024-02-06T14:51:04Z

For this:

the issue isn't how many submodules we have, it's whether we have submodules at all.

I'd say the more submodules, the more ways things can get out of date, so it's also a matter of how many. I'm happy that we have npm run sync, but I'm hoping we can help a little more with automations. Layering on this, the refcache, and the oodles of checks we now have there's a lot of ways for a "simple contribution" to turn into a whole ordeal of unhappiness.

chalin · 2024-02-06T15:50:09Z

@svrnm wrote:

turn this into a /fix:sync as well?

Great idea! Let's track that via:

[CI] Create an action to run npm run sync #3950

Also, I am wondering, how much do contributors need to interact with that?

Initially, no interaction at all.

So, if I fix some typos or words in the docs, do I even need to install&use that tooling?

No.

Or, even better, could we package all of that in a github workflow that regularly fetches the code repos, executes that code and then raises the appropriate PR?

Yes.

One key element for this to be successful is that we setup a proper "contract" between us and the code-owning SIGs.

Right.

We're trying to solve an inherently challenging problem. I propose that we approach this incrementally. First, let's agree on the base tooling (even if it changes later) so that we can start with something concrete. Then find a SIG willing to have us sync their code. Then incrementally bring in the automation, adjusting as we go along.

My proposal is that contributors not have to interact with this at all in the beginning. We maintainers/approvers would manually run the code-excerpt script when needed -- e.g., to handle situations like #3949 -- and commit changes.

Eventually we'll need to educate contributors that code changes need to happen at the source. There are ways to mitigate this (that I can share at our next Comms meeting), but IMHO we should stay focus on first steps.

@cartermp wrote:

I'd say the more submodules, the more ways things can get out of date, so it's also a matter of how many.

You're right, of course: the number of submodules does have an incremental impact.

I'm happy that we have npm run sync, but I'm hoping we can help a little more with automations.

That's the plan.

Layering on this, the refcache, and the oodles of checks we now have there's a lot of ways for a "simple contribution" to turn into a whole ordeal of unhappiness.

Yup, and we've been trying to do our best to keep all stakeholders as happy as possible (#2448, #3149). There are inherent complexities in the problem we're trying to manage (supporting the build, checking and deployment of a non-trivial tech-docs website with parts spread across repos) beyond "just writing docs". E.g., submodules can be a pain to deal, but most of the time IMHO, they help cleanly solve the problem of keeping content across repos in sync.

svrnm · 2024-02-07T08:34:01Z

<?code-excerpt "otel.go" from="package main"?>

For the sake of consistency, can we wrap this in a short code that is only there to replace those <? with {{%:

{{% code-excerpt "otel.go" from="package main" %}}

We're trying to solve an inherently challenging problem. I propose that we approach this incrementally. First, let's agree on the base tooling (even if it changes later) so that we can start with something concrete. Then find a SIG willing to have us sync their code. Then incrementally bring in the automation, adjusting as we go along.

My proposal is that contributors not have to interact with this at all in the beginning. We maintainers/approvers would manually run the code-excerpt script when needed -- e.g., to handle situations like #3949 -- and commit changes.

Eventually we'll need to educate contributors that code changes need to happen at the source. There are ways to mitigate this (that I can share at our next Comms meeting), but IMHO we should stay focus on first steps.

+1 for this approach, in the worst case(!) we can pull back and revert that tooling. We should gain some experience with it and see if it adds more value vs more issues and then make a judgement. Maybe we need to onboard 2 to 3 SIGs to make a final decision if we want to go all in or not. Let's assume we take 1-2 months to add one more SIG we might be able to go from "beta" to rolling this out in ~3-4 months. WDYT?

@cartermp wrote:

I'd say the more submodules, the more ways things can get out of date, so it's also a matter of how many.

You're right, of course: the number of submodules does have an incremental impact.

I'm happy that we have npm run sync, but I'm hoping we can help a little more with automations.

That's the plan.

Layering on this, the refcache, and the oodles of checks we now have there's a lot of ways for a "simple contribution" to turn into a whole ordeal of unhappiness.

Yup, and we've been trying to do our best to keep all stakeholders as happy as possible (#2448, #3149). There are inherent complexities in the problem we're trying to manage (supporting the build, checking and deployment of a non-trivial tech-docs website with parts spread across repos) beyond "just writing docs". E.g., submodules can be a pain to deal, but most of the time IMHO, they help cleanly solve the problem of keeping content across repos in sync.

If we wrap all that code-updating into a workflow or into something only maintainers touch, I wonder if we should pick some "conditional" submodules, i.e. they are not part of .gitmodules by default and a maintainer (or a bot) needs to run a script first to add and load them, do the script updates, run a script to unload them and send the PR. It adds some extra burden on us / the workflow but it removes the necessity for regular contributors to interact with even more submodules

chalin · 2024-02-09T00:15:52Z

@svrnm wrote:

For the sake of consistency, can we wrap this in a short code that is only there to replace those <? with {{%:

Eventually, but for now I'd like to focus on using the tools rather than customizing them. I have further improvements in mind, but again, I'd rather we gain experience with existing tooling for starters.

+1 for this approach, in the worst case(!) we can pull back and revert that tooling. We should gain some experience with it and see if it adds more value vs more issues and then make a judgement. Maybe we need to onboard 2 to 3 SIGs to make a final decision if we want to go all in or not. Let's assume we take 1-2 months to add one more SIG we might be able to go from "beta" to rolling this out in ~3-4 months. WDYT?

👍🏻

If we wrap all that code-updating into a workflow or into something only maintainers touch, I wonder if we should pick some "conditional" submodules, i.e. they are not part of .gitmodules by default and a maintainer (or a bot) needs to run a script first to add and load them, do the script updates, run a script to unload them and send the PR. It adds some extra burden on us / the workflow but it removes the necessity for regular contributors to interact with even more submodules

Sure, but with the magic and ease of npm run sync, I'd say that there's no extra burden on regular contributors. So I'd avoid the extra cost (for now) of conditional / secondary submodules.

WDYT?

svrnm · 2024-02-09T10:33:38Z

@svrnm wrote:

For the sake of consistency, can we wrap this in a short code that is only there to replace those <? with {{%:

Eventually, but for now I'd like to focus on using the tools rather than customizing them. I have further improvements in mind, but again, I'd rather we gain experience with existing tooling for starters.

+1

If we wrap all that code-updating into a workflow or into something only maintainers touch, I wonder if we should pick some "conditional" submodules, i.e. they are not part of .gitmodules by default and a maintainer (or a bot) needs to run a script first to add and load them, do the script updates, run a script to unload them and send the PR. It adds some extra burden on us / the workflow but it removes the necessity for regular contributors to interact with even more submodules

Sure, but with the magic and ease of npm run sync, I'd say that there's no extra burden on regular contributors. So I'd avoid the extra cost (for now) of conditional / secondary submodules.

If we start with one repo (=> go), I have no concern with adding this one more sub-module for now. Let's make it work with go, and then keep it going for a few weeks and see how many things break.

@pellared , @cartermp: wdyt?

cartermp · 2024-02-09T23:49:59Z

So one thing that this has caused me to think about is adding an action for a "please fix all my shit". So it fixes formatting, the refcache, runs npm sync, etc. Maybe if we not only have the actions in place, but also a single place to mostly fix most things, then that could be enough to ease my unease around the contributor experience.

svrnm · 2024-02-12T10:10:55Z

So one thing that this has caused me to think about is adding an action for a "please fix all my shit". So it fixes formatting, the refcache, runs npm sync, etc. Maybe if we not only have the actions in place, but also a single place to mostly fix most things, then that could be enough to ease my unease around the contributor experience.

Like a fix all?

--> https://github.com/open-telemetry/opentelemetry.io/blob/main/.github/workflows/pr-actions.yml#L82

It does not have all the fixes yet, but updating that workflow file has become much easier now. So that should hopefully help a lot.

Another thing we could do is adding a workflow that checks CI results and let's people know that they can trigger those actions, or that they should not worry about them too much and that we can fix that for them before merging.

hughesjj · 2024-02-12T18:11:03Z

Another thing we could do is adding a workflow that checks CI results and let's people know that they can trigger those actions, or that they should not worry about them too much and that we can fix that for them before merging.

I prefer this approach given it's more "context-aware" and proactive to the contributors issue/flow, rather than "oh read the doc"

hughesjj

lgtm other than one concern/question

hughesjj · 2024-02-12T18:12:30Z

package.json

@@ -43,6 +43,9 @@
    "check:text": "npm run _check:text -- ",
    "check": "npm run seq -- $(npm run -s _list:check:*)",
    "clean": "make clean",
+    "code-excerpts": "rm -Rf tmp/excerpts/* && npm run seq -- code-excerpts:get code-excerpts:update-docs",


just to make sure: the other tmp references are ../tmp and this one is just tmp, is that expected?

Yes, this is expected, but thanks for noticing: the other commands do a cd tools first, hence the leading ../.

svrnm

As discussed yesterday during the SIG call we want to proceed with this.

svrnm · 2024-02-13T14:44:16Z

@chalin update and merge this PR whenever you are ready

svrnm · 2024-02-16T08:45:40Z

awesome, thanks @chalin! Now let's see how this works in the future.

chalin · 2024-02-16T16:58:28Z

@svrnm - I'm OOO next week, but will dive into using this again (probably for the demo) when I'm back (FYI)

chalin added the CI/infra CI & infrastructure label Feb 6, 2024

chalin requested review from a team February 6, 2024 11:41

chalin force-pushed the chalin-code-excerpt-2024-02-04 branch 2 times, most recently from 1a1025a to fee3614 Compare February 6, 2024 11:48

chalin force-pushed the chalin-code-excerpt-2024-02-04 branch from fee3614 to 0cafa2e Compare February 6, 2024 14:27

chalin mentioned this pull request Feb 6, 2024

[CI] Create an action to run npm run sync #3950

Closed

hughesjj approved these changes Feb 12, 2024

View reviewed changes

svrnm approved these changes Feb 13, 2024

View reviewed changes

cartermp approved these changes Feb 13, 2024

View reviewed changes

[CI] Support code-snippet extraction and docs-updating

234cf67

chalin force-pushed the chalin-code-excerpt-2024-02-04 branch from 2fe8b61 to 234cf67 Compare February 16, 2024 00:28

chalin merged commit b1712ad into open-telemetry:main Feb 16, 2024
14 checks passed

chalin deleted the chalin-code-excerpt-2024-02-04 branch February 16, 2024 00:30

chalin mentioned this pull request Feb 16, 2024

Add tooling in support of auto-update of code excerpts in docs #1635

Open

4 tasks

chalin mentioned this pull request Feb 16, 2024

2024/02 planning for Patrice #3918

Closed

8 tasks

svrnm mentioned this pull request Apr 10, 2024

Embed source code snippets from example repos #4276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Support code-snippet extraction and docs-updating #3948

[CI] Support code-snippet extraction and docs-updating #3948

chalin commented Feb 6, 2024 •

edited

Loading

pellared commented Feb 6, 2024

cartermp commented Feb 6, 2024

chalin commented Feb 6, 2024 •

edited

Loading

svrnm commented Feb 6, 2024

cartermp commented Feb 6, 2024 •

edited

Loading

chalin commented Feb 6, 2024

svrnm commented Feb 7, 2024

chalin commented Feb 9, 2024

svrnm commented Feb 9, 2024

cartermp commented Feb 9, 2024

svrnm commented Feb 12, 2024

hughesjj commented Feb 12, 2024

hughesjj left a comment

hughesjj Feb 12, 2024

chalin Feb 16, 2024

svrnm left a comment

svrnm commented Feb 13, 2024

svrnm commented Feb 16, 2024

chalin commented Feb 16, 2024

[CI] Support code-snippet extraction and docs-updating #3948

[CI] Support code-snippet extraction and docs-updating #3948

Conversation

chalin commented Feb 6, 2024 • edited Loading

What's this PR about?

How it works

How can I try this myself? (For maintainers/approvers only)

PR details

⚠️ Important: Consider this feature in beta, and for repo maintainers/approvers only.

For later

pellared commented Feb 6, 2024

cartermp commented Feb 6, 2024

chalin commented Feb 6, 2024 • edited Loading

svrnm commented Feb 6, 2024

cartermp commented Feb 6, 2024 • edited Loading

chalin commented Feb 6, 2024

svrnm commented Feb 7, 2024

chalin commented Feb 9, 2024

svrnm commented Feb 9, 2024

cartermp commented Feb 9, 2024

svrnm commented Feb 12, 2024

hughesjj commented Feb 12, 2024

hughesjj left a comment

Choose a reason for hiding this comment

hughesjj Feb 12, 2024

Choose a reason for hiding this comment

chalin Feb 16, 2024

Choose a reason for hiding this comment

svrnm left a comment

Choose a reason for hiding this comment

svrnm commented Feb 13, 2024

svrnm commented Feb 16, 2024

chalin commented Feb 16, 2024

chalin commented Feb 6, 2024 •

edited

Loading

chalin commented Feb 6, 2024 •

edited

Loading

cartermp commented Feb 6, 2024 •

edited

Loading