Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render Markdown in search results #77686

Merged
merged 6 commits into from
Dec 4, 2020

Conversation

camelid
Copy link
Member

@camelid camelid commented Oct 8, 2020

Fixes #32040.

Previously Markdown documentation was not rendered to HTML for search results,
which led to the output not being very readable, particularly for inline code.
This PR fixes that by rendering Markdown to HTML with the help of pulldown-cmark
(the library rustdoc uses to parse Markdown for the main text of documentation).
However, the text for the title attribute (the text shown when you hover over an
element) still uses the plain-text rendering since it is displayed in browsers
as plain-text.

Only these styles will be rendered; everything else is stripped away:

  • italics
  • bold
  • inline code

@camelid camelid added the T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. label Oct 8, 2020
@rust-highfive
Copy link
Collaborator

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez

@rust-highfive
Copy link
Collaborator

r? @jyn514

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 8, 2020
@camelid
Copy link
Member Author

camelid commented Oct 8, 2020

Here's what it looks like:

image

Still some bugs to fix, like the brackets in the code links. Interestingly, links are rendered correctly with plain text (i.e., rendered as just the link text), but not with inline code. Also, some of the summaries have an extra space before the ellipsis.

@camelid
Copy link
Member Author

camelid commented Oct 8, 2020

Also the summary for Command::arg0() isn't being rendered for some reason. Maybe it's because it's platform-specific? (It's not being rendered in the official docs either though, so not because of this.)

@jyn514
Copy link
Member

jyn514 commented Oct 8, 2020

cc @euclio , you know this code better than I do.

@jyn514
Copy link
Member

jyn514 commented Oct 8, 2020

@camelid have you considered the issues ollie mentioned in #68699 (comment) ?

@camelid
Copy link
Member Author

camelid commented Oct 8, 2020

Ah, I was trying to figure out where the alt-text was used. I'll add back the plain summary function in addition, and use that for alt-text.

@jyn514
Copy link
Member

jyn514 commented Oct 8, 2020

cc @GuillaumeGomez , this replaces #68699. I'm not yet sure how it differs from that PR, although I think it does go through pulldown instead of re-parsing markdown in javascript.

@camelid
Copy link
Member Author

camelid commented Oct 8, 2020

Yes, this approach is to render the HTML using pulldown ahead of time.

@camelid camelid force-pushed the rustdoc-render-search-results branch 2 times, most recently from e766c01 to aa88d52 Compare October 8, 2020 23:23
@camelid
Copy link
Member Author

camelid commented Oct 8, 2020

Hmm, it's still rendering alt-text as HTML...

@bors

This comment has been minimized.

@camelid camelid force-pushed the rustdoc-render-search-results branch from b2fe8de to e1e172f Compare October 24, 2020 01:00
@camelid
Copy link
Member Author

camelid commented Oct 24, 2020

Rebased.

Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was trying to figure out where the alt-text was used. I'll add back the plain summary function in addition, and use that for alt-text.

Did you fix this? If so, can you add a test?

We discussed in PMs that you're still working on this so I'm not doing a full review for now - let me know when you've got this working and I'll take another look.

src/librustdoc/html/markdown.rs Outdated Show resolved Hide resolved
src/librustdoc/html/static/main.js Show resolved Hide resolved
src/librustdoc/html/render/mod.rs Outdated Show resolved Hide resolved
src/librustdoc/html/markdown.rs Outdated Show resolved Hide resolved
@jyn514 jyn514 added the A-rustdoc-search Area: Rustdoc's search feature label Oct 24, 2020
Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm the best reviewer for this.

r? @GuillaumeGomez

src/librustdoc/html/static/main.js Outdated Show resolved Hide resolved
@camelid camelid marked this pull request as ready for review November 6, 2020 20:09
return html.replace('<code>', '`').replace('</code>', '`')
.replace(/<[^>]+>/, '');
var dom = new DOMParser().parseFromString(
html.replace('<code>', '`').replace('</code>', '`'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is no longer necessary I think.

Suggested change
html.replace('<code>', '`').replace('</code>', '`'),
html,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I take it back - it is necessary if you want to replace code blocks with `. Not sure why rustdoc is doing that, though.

Copy link
Member Author

@camelid camelid Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not strictly necessary, but I think it's better to do it this way, for one since it used to use ` here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it kinda makes sense. However, we also keep other HTML tags like <b> and <i>. Should we also "revert" them into markdown too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not strictly necessary, but I think it's better to do it this way, for one since it used to use ` here.

My comment still stands. :p

Copy link
Member Author

@camelid camelid Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the same issue as in #68699 (comment) that we don't know what was originally html and what was markdown

I do not think that is correct. The example given in #68699 (comment) was:

/// foo `this is a code span` bar \`this isn't\` baz
pub struct Foo;

That is stored in the search index as:

foo <code>this is a code span</code> bar `this isn't` baz

And then convertHTMLToPlaintext converts that into:

foo `this is a code span` bar `this isn't` baz

Which is exactly the same as the current hover text (that you get with rustdoc nightly).


That said, I am open to just stripping every HTML tag if we decide that would be better. I just don't think #68699 (comment) is relevant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camelid the test case is the other way:

/// foo <code>this is a code span</code> in HTML
///
/// `this is markdown` baz
pub struct Foo;

There's no way to get back that info after the fact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copy-pasted the test case from the comment you linked to. The comment doesn't seem to be taking issue with the example you're giving. This is the full comment text:

No. The summaries stored in the search index don't contain that information. For:

/// foo `this is a code span` bar \`this isn't\` baz
pub struct Foo;

the search index contains:

foo `this is a code span` bar `this isn't` baz

There is no way in JavaScript to know which backticks meant codeblocks and which didn't.

As I've said, to do this right the summaries need to be rendered and stored in the search index as HTML.

It is saying that converting backticks into HTML is a problem because they both mean inline code and literal backticks. Converting in the opposite direction is loss-less.


And, either way, I don't see how converting both `code` and <code>code</code> into `code` when this is just for alt-text is an issue. If for some reason this is a blocker, I am absolutely fine to just strip all HTML tags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about this but I don't think it should block this change. So if @GuillaumeGomez thinks this is good to go I do too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said I didn't. 😛

@camelid
Copy link
Member Author

camelid commented Dec 3, 2020

@Mark-Simulacrum Wait, but this will make the squashed commit show as authored by bors.

@bors squash-

@bors
Copy link
Contributor

bors commented Dec 3, 2020

@camelid: 🔑 Insufficient privileges: not in try users

@jyn514
Copy link
Member

jyn514 commented Dec 3, 2020

@bors squash-

@Mark-Simulacrum
Copy link
Member

I am confused -- my understanding is that the functionality in bors does not produce that result? It'll set the committer to bors most likely but that seems fine?

@camelid
Copy link
Member Author

camelid commented Dec 3, 2020

I'm not sure, but I'd rather squash by hand if you want me to squash.

@Mark-Simulacrum
Copy link
Member

I would personally like to at least get rid of the "Fix typo" commit, I don't care strongly about others though.

Previously Markdown documentation was not rendered to HTML for search results,
which led to the output not being very readable, particularly for inline code.
This PR fixes that by rendering Markdown to HTML with the help of pulldown-cmark
(the library rustdoc uses to parse Markdown for the main text of documentation).
However, the text for the title attribute (the text shown when you hover over an
element) still uses the plain-text rendering since it is displayed in browsers
as plain-text.

Only these styles will be rendered; everything else is stripped away:

* *italics*
* **bold**
* `inline code`
@GuillaumeGomez was concerned about browser compatibility.
Accidentally removed in rebase.
@camelid camelid force-pushed the rustdoc-render-search-results branch from 4cb30b2 to 376507f Compare December 3, 2020 22:12
@camelid
Copy link
Member Author

camelid commented Dec 3, 2020

I squashed in the "Fix typo" commit. @Mark-Simulacrum could you r=GuillaumeGomez?

@Mark-Simulacrum
Copy link
Member

I guess 376507f is also a typo but I'm out of energy here so @bors r=GuillaumeGomez delegate+

@bors
Copy link
Contributor

bors commented Dec 3, 2020

📌 Commit 376507f has been approved by GuillaumeGomez

@bors
Copy link
Contributor

bors commented Dec 3, 2020

✌️ @camelid can now approve this pull request

@camelid
Copy link
Member Author

camelid commented Dec 3, 2020

If it's a big deal, I can squash more away, but I would really like to get this merged.

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 4, 2020
Rollup of 11 pull requests

Successful merges:

 - rust-lang#77686 (Render Markdown in search results)
 - rust-lang#79541 (Doc keyword lint pass)
 - rust-lang#79602 (Fix SGX CI)
 - rust-lang#79611 (Use more std:: instead of core:: in docs for consistency)
 - rust-lang#79623 (Pass around Symbols instead of Idents in doctree)
 - rust-lang#79627 (Update cargo)
 - rust-lang#79631 (disable a ptr equality test on Miri)
 - rust-lang#79638 (Use `item_name` instead of pretty printing for resolving `Self` on intra-doc links)
 - rust-lang#79646 (rustc_metadata: Remove some dead code)
 - rust-lang#79664 (move interpret::MemoryKind::Heap to const eval)
 - rust-lang#79678 (Fix some clippy lints)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors
Copy link
Contributor

bors commented Dec 4, 2020

⌛ Testing commit 376507f with merge e9dd18c...

@bors bors merged commit 6f2fbc1 into rust-lang:master Dec 4, 2020
@rustbot rustbot added this to the 1.50.0 milestone Dec 4, 2020
@GuillaumeGomez
Copy link
Member

If it's a big deal, I can squash more away, but I would really like to get this merged.

Don't worry it wasn't a big issue. Thanks a lot for working on this!

@camelid camelid deleted the rustdoc-render-search-results branch December 4, 2020 20:12
@camelid
Copy link
Member Author

camelid commented Dec 4, 2020

Yay! Excited to look at the docs for the next nightly :)

camelid added a commit to camelid/rust that referenced this pull request Dec 7, 2020
I added `summary_opts()` before I cut the branch for rust-lang#77686 (2 months
ago!), so this "slipped through the cracks".
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 8, 2020
…Gomez

Use `summary_opts()` in another spot

I added `summary_opts()` before I cut the branch for rust-lang#77686 (2 months
ago!), so this "slipped through the cracks".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-markdown-parsing Area: Markdown parsing for doc-comments A-rustdoc-search Area: Rustdoc's search feature S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rustdoc search results ignore markdown formatting
7 participants