Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(document-extractor): split h4 sections #4609

Closed
wants to merge 51 commits into from
Closed

fix(document-extractor): split h4 sections #4609

wants to merge 51 commits into from

Conversation

tannerdolby
Copy link
Contributor

@tannerdolby tannerdolby commented Sep 3, 2021

Summary

Fixes #4370

Problem

Currently only <h1>-<h3> are turned into direct links, which are incredible useful for sharing a link directly to a section in the docs. Since the h4 tags aren't apart of the section splitting logic, searching for a <h4> which is deeply nested in a section can be time consuming if your trying to quickly share a direct reference to a section that begins with an h4.

Solution

Add <h4> tags into the existing section splitting logic.


Screenshots

Before

Screen Shot 2022-05-13 at 10 28 17 PM

After

Screen Shot 2022-05-13 at 10 30 42 PM


How did you test this change?

Running Yari locally and checking a few pages manually.


Original description

This branch is up-to-date with the latest main so the changes to the production workflow are included. I'm still in the process of running a full build of all the content to pinpoint where exactly the problematic code was.

After running a full build (mdn/content and mdn/translated-content) using yarn build, it does successfully build the 42k pages in 74.5 minutes but with a bunch of flaws. Which the flaws might be fine, I'm just not sure what the baseline was in order to compare before/after.

There were lots of MacroLiveSampleError saying "unable to find any live code samples for 'X'"

MacroLiveSampleError within /Users/TannerDolby/translated-content/files/de/learn/javascript/first_steps/what_is_javascript/index.html, line 84 column 4 (unable to find any live code samples for "A_high-level_definition" within /de/docs/Learn/JavaScript/First_steps/What_is_JavaScript) 

MacroLiveSampleError within /Users/TannerDolby/translated-content/files/de/web/css/calc()/index.html, line 77 column 4 (unable to find any live code samples for "Relative_Grösse_eines_Objekts_mit_einer_absoluten_Positionierung" within /de/docs/Web/CSS/calc())

but other than that nothing actually made the build break or fail. There were lots of CSSRef issues where the smartlink is broken

In CSSRef the smartLink to /de/docs/Web/CSS/Layout_cookbook/Pagination is broken
In CSSRef the smartLink to /de/docs/Web/CSS/Layout_cookbook/Card is broken
In CSSRef the smartLink to /de/docs/Web/CSS/Layout_cookbook/Grid_wrapper is broken

and this is the output from the full build (content and translated content)

...
/Users/TannerDolby/yari/client/build/en-us/docs/webassembly/understanding_the_text_format
/Users/TannerDolby/yari/client/build/en-us/docs/webassembly/using_the_javascript_api
Built 42,430 pages in 74.5 minutes, at a rate of 9.5 documents per second.
Peak heap memory usage: 1.48 GB

Total_Flaws_Count
bad_bcd_links            2,558
bad_bcd_queries          836
bad_pre_tags             3,166
broken_links             87,346
heading_links            1,842
image_widths             6,758
images                   10,068
macros                   78,801
sectioning               2,204
translation_differences  17,756

✨  Done in 4475.46s.

cc @escattone

@tannerdolby
Copy link
Contributor Author

tannerdolby commented Sep 3, 2021

It looks like within the Functional Testing suite the playwright test headless command is displaying this error:

  1. testing/tests/headless.sitesearch.spec.js:10:3 › Site search submit the autocomplete search form will redirect to site search .. Timeout of 30000ms exceeded.

@schalkneethling
Copy link
Contributor

It looks like within the Functional Testing suite the playwright test headless command is displaying this error:

  1. testing/tests/headless.sitesearch.spec.js:10:3 › Site search submit the autocomplete search form will redirect to site search .. Timeout of 30000ms exceeded.

Looks like that was a hiccup. I reran all the actions and everything looks to be green now.

@schalkneethling
Copy link
Contributor

@tannerdolby I cannot remember. With this PR, did you manage to address the issue that we ran into the last time we merged this work, or is that still on the to-do list? Thanks!

@tannerdolby
Copy link
Contributor Author

tannerdolby commented Sep 28, 2021

@schalkneethling No problem! I've looked through the proposed changes here and couldn't pinpoint exactly what caused the issue when it was previously merged. If we can figure out how to create a reproducible example of that error then the problematic code should hopefully become more clear to us.

@schalkneethling
Copy link
Contributor

@schalkneethling No problem! I've looked through the proposed changes here and couldn't pinpoint exactly what caused the issue when it was previously merged. If we can figure out how to create a reproducible example of that error then the problematic code should hopefully become more clear to us.

@escattone Thoughts?

@escattone
Copy link
Contributor

@tannerdolby Thanks for all of your work to date on this! Unfortunately, the engineering team doesn't have much time to look into this further given other priorities. If you have the time and desire, for next steps I would recommend:

  1. Running the build within this branch on the translated content to identify one or more documents that fail.
  2. Selecting one of those documents as your test case to use for debugging the root cause.

@tannerdolby
Copy link
Contributor Author

tannerdolby commented Nov 7, 2021

You're welcome @escattone! I have the time/desire to see this through so I'm happy to stick around until we pinpoint the root cause. I've done the steps you suggested and learned a few things.

When running a full build on translated content within this branch, the first translated content page to raise an error is:

/Users/TannerDolby/translated-content/files/de/learn/html/introduction_to_html/advanced_text_formatting/index.html

Error: MacroLiveSampleError within /Users/TannerDolby/translated-content/files/de/learn/html/introduction_to_html/advanced_text_formatting/index.html, line 145 column 4 (unable to find any live code samples for "Playable_code_1" within /de/docs/Learn/HTML/Introduction_to_HTML/Advanced_text_formatting)

and for the remainder of the build many errors are logged but none of the pages actual fail and cause the build to halt.

Inside of index.js where this MacroLiveSampleError flaw is handled, if (flaw.name === "MacroLiveSampleError") { ... there is a comment mentioning,

// As of April 2021 there are 0 pages in mdn/content that trigger
// a MacroLiveSampleError. So we can be a lot more strict with en-US
// until the translated-content has had a chance to clean up all
// their live sample errors.
// See https://github.com/mdn/yari/issues/2489

As the only major errors when running a build on translated content are MacroLiveSampleErrors, it leads me to believe that this is probably where the error that broke the build might be coming from.

if (document.metadata.locale === "en-US") {
          throw new Error(
            `MacroLiveSampleError within ${flaw.filepath}, line ${flaw.line} column ${flaw.column} (${flaw.error.message})`
          );
        } else {
          console.warn(
            `MacroLiveSampleError within ${flaw.filepath}, line ${flaw.line} column ${flaw.column} (${flaw.error.message})`
          );
        }

@tannerdolby
Copy link
Contributor Author

tannerdolby commented Nov 8, 2021

In this comment, Peter mentions, "So now, for en-US, it will crash the CI build, for translated-content it will just be a console.warn". Which is what I'm seeing, it will just be a console.warn (not an error that will crash the CI build) which I'm receiving in the logs, I just can't reproduce the CI build crashing. Since I didn't see any MacroLiveSampleErrors for en-US pages and only for translated-content, it led me to believe the throw new Error() was never being called since that would produce an error that crashes the build. I'm not entirely sure if this is conflicting the with logic for handling headings in this PR but it seemed worthwhile to learn more about.

There were other errors (aside from the MacroLiveSampleError) in the build for content and translated content like:

  • In CSS_Ref the smartLink to /es/docs/Web/CSS/transform-function/rotateX is broken (many of these)
  • Images missing src attributes: In /es/docs/Learn/Server-side/Configuring_server_MIME_types there's an img tag without src (<img align="right" alt="Example of an incorrect MIME type result">)
  • Can't get English original from fr/conflicting/learn/css/building_blocks/styling_tables
  • Unable to decodeURI 'https://www.ecma-international.org/ecma-262/6.0/#sec-get-%typedarray%.prototype.buffer'. Will proceed without.

@schalkneethling I will use one of these first few translated pages as a test case and scan through my proposed changes in this PR to better figure out what could be causing problems for translated-content. I will do some further investigating.

@schalkneethling
Copy link
Contributor

Thanks so much @tannerdolby. Much appreciated.

@tannerdolby
Copy link
Contributor Author

tannerdolby commented Sep 5, 2022

@caugner Thanks for review so far. I've added the suggestions you had and everything is running now. The only problem is that functional testing is failing and<h4> elements don't have the expected <a> tags, the test cases for h4s are only picking up the text content when .html() is asserted against so the links arent being generated.

Apologies for all the testing/debugging commits in this last batch. The yarn test:testing wasn't producing the live results when running locally so I was relying on the build & testing pipeline logs here on GitHub.

Expected: "<a href=\"#final_heading\" title=\"Permalink to Final heading\">Final heading</a>"
    Received: "Final heading"

1805 |   expect(h4.html()).toBe(
     |                    ^
1806 |     '<a href="#final_heading" title="Permalink to Final heading">Final heading</a>'
1807 |   );

I haven't been in the codebase in awhile so I'm still getting familiarized with things again, but I could imagine the document-extractor.js just needs to be updated to implement similar logic as the h2/h3s within the _addSectionProse function. I tried experimenting but wasn't entirely sure how the whole flow worked. It might be something straightforward that I'm not seeing.

I had to add isH4 as a property to the types in document.ts that could be assigned to the Section type. This was for resolving the errors that were being thrown to get things running but it shouldn't have any destructive behavior.

@caugner
Copy link
Contributor

caugner commented Nov 11, 2022

FWIW We need to make sure that doesn't break the EmbedLiveSample macro (see this issue).

@github-actions github-actions bot added the merge conflicts 🚧 Please rebase onto or merge the latest main. label Nov 23, 2022
@github-actions
Copy link
Contributor

This pull request has merge conflicts that must be resolved before it can be merged.

@github-actions github-actions bot added the idle label Dec 23, 2022
@github-actions github-actions bot removed the idle label Jun 19, 2023
@github-actions github-actions bot added the idle label Jul 20, 2023
@caugner caugner removed their assignment Jul 21, 2023
@github-actions github-actions bot removed the idle label Jul 21, 2023
@github-actions github-actions bot added the idle label Aug 21, 2023
@tannerdolby tannerdolby closed this by deleting the head repository Sep 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧑‍🤝‍🧑 community contributions by our wonderful community idle merge conflicts 🚧 Please rebase onto or merge the latest main.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Should <h4> tags get direct linkified?
5 participants