Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search correct language for Docusaurus docsearch #744

Merged
merged 6 commits into from
Jun 18, 2018

Conversation

JoelMarcey
Copy link
Contributor

Motivation

Only search the current version of the docs. Right now there are duplicates because next is also searched.

https://github.com/facebook/Docusaurus/blob/d28b864a59fabeea45add8c090a13de7d0530de5/lib/core/Site.js#L140 does the replacement

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

eyes

Related PRs

N/A

Only search the current version of the docs. Right now there are duplicates because `next` is also searched.

https://github.com/facebook/Docusaurus/blob/d28b864a59fabeea45add8c090a13de7d0530de5/lib/core/Site.js#L140 does the replacement
@docusaurus-bot
Copy link
Contributor

docusaurus-bot commented Jun 9, 2018

Deploy preview for docusaurus-preview ready!

Built with commit 0dfce4a

https://deploy-preview-744--docusaurus-preview.netlify.com

@endiliey
Copy link
Contributor

endiliey commented Jun 9, 2018

Actually we should probably do it for languages too.

But the current state is that
https://github.com/algolia/docsearch-configs/blob/master/configs/docusaurus.json

Only has 'en' for lang and 'next' for version
So no matter which language or version we always only search on that lang & version.

I am currently away so I won't be able to submit PR for it.

JoelMarcey added a commit to JoelMarcey/docsearch-configs that referenced this pull request Jun 9, 2018
Currently searching both next and latest. Only want latest.

Ref: facebook/docusaurus#744
@JoelMarcey JoelMarcey changed the title Remove duplicate search results Remove duplicate search results and add languages Jun 9, 2018
@JoelMarcey
Copy link
Contributor Author

@endiliey I added a PR to Aloglia

Copy link
Contributor

@endiliey endiliey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a fix to your PR @JoelMarcey

Let's wait for algolia/docsearch-configs#446 to be merged & test it on netlify

@@ -36,6 +36,7 @@ const siteConfig = {
algolia: {
apiKey: '3eb9507824b8be89e7a199ecaa1a9d2c',
indexName: 'docusaurus',
facetFilters: [ "version:VERSION" ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be.

algoliaOptions: {
      facetFilters: [ "version:VERSION"] 
}

@@ -36,6 +36,7 @@ const siteConfig = {
algolia: {
apiKey: '3eb9507824b8be89e7a199ecaa1a9d2c',
indexName: 'docusaurus',
facetFilters: [ "lang:LANGUAGE", "version:VERSION" ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need algoliaOptions for this to work

https://docusaurus.io/docs/en/next/search.html#enabling-the-search-bar

algoliaOptions: {
      facetFilters: ["lang:LANGUAGE", "version:VERSION"] 
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yangshun
Copy link
Contributor

Is this PR being blocked by algolia/docsearch-configs#446?

@endiliey
Copy link
Contributor

@yangshun. Yes, tbh we need to wait ~24 hours after that PR got merged so algolia crawler had finished crawling & we can test this

s-pace pushed a commit to algolia/docsearch-configs that referenced this pull request Jun 11, 2018
Copy link
Contributor

@endiliey endiliey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

algolia/docsearch-configs#446 only allow latest version.

When we are at docusaurus non-latest version, the search will not work.

Solution

  1. Change "version:VERSION" to "version:latest"
    But this means whichever version we are at, we are searching on latest docs only. I think this is unnatural, let's say I'm at docusaurus version 1.05 page, the latest version is 1.2 and language 'en', it will search for the version 1.2 instead of 1.05

  2. Change algolia config to include all docusaurus version. (Submit another PR at docsearch-configs)
    So when we are at v1.05 and lang 'ko' it will search on that version & language

s-pace pushed a commit to algolia/docsearch-configs that referenced this pull request Jun 11, 2018
@s-pace
Copy link

s-pace commented Jun 11, 2018

👋

I am currently trying to tailor you the best configuration in order to have something simple and easy to understand.

We are facing two majors issue regarding how we scrap your website:

  1. There is no redirection from https://docusaurus.io/docs/<lang>/ to a document page like https://docusaurus.io/docs/<lang>/installation.
    Could it be possible to have such redirection straight away?

  2. Sitemap introduces pages with URL ending by ".html" for the english pages while the same page without this trailing ".html" are still available. This introduces some duplicates since we crawl pages with and without the trailing ".html" in order to discover non-english pages. Could it be possible to avoid these pages with the trailing ".html" from the sitemap?

An easier solution would be to extract the context of the page (version and lang) from the DOM itself. Could it be possible to add meta tags?

@JoelMarcey
Copy link
Contributor Author

@s-pace Hi 👋 -- sorry this is causing so much trouble. But I am hoping if we solve this correctly for Docusaurus, then sites that use Docusaurus can benefit from what we learned.

An easier solution would be to extract the context of the page (version and lang) from the DOM itself. Could it be possible to add meta tags?

I think we can probably do this, maybe. Just any meta tag? e.g.

<meta docLanguage = ...>
<meta docVersion = ...>

If we use meta tags:

  1. Can we avoid the redirection problems you raised?
  2. Would the sitemap issue be fixed too?
  3. Would we need to change anything on the Algolia config side to enable this?

Thanks!

@s-pace
Copy link

s-pace commented Jun 12, 2018

@JoelMarcey

I am hoping if we solve this correctly for Docusaurus, then sites that use Docusaurus can benefit from what we learned.

Makes sense definitely. Let's try to have something working with it.

Using meta tag is fine for us. It will introduce a bit more of complexity within your configuration but we can do it.

  1. Can we avoid the redirection problems you raised?

This will not change the issue we are struggling with since we receive a 404 and thus we stop crawling your website. If this is not possible, we will have to write one start_url object to reach an available page and one for the root in order to define the allowed domain to crawl. Although this can be solved if you provide us a clean and complete sitemap (easiest solution).

  1. Would the sitemap issue be fixed too?

Unfortunately this issue will not be fixed since pages are treated as different pages. The easiest solutions I see:

  1. Would we need to change anything on the DocSearch config side to enable this?

Yes, we will need to change something from the config anyway. This can be quickly done.

@endiliey
Copy link
Contributor

endiliey commented Jun 12, 2018

@s-pace

  1. I see two solutions of this. Either
    A. Redirect docusaurus.io/docs/<lang>/ to any first available docs
    B. Change the docsearch config start_url from docusaurus.io/docs/<lang>/ to docusaurus.io/docs/<lang>/installation so that it wont be 404 & algolia can crawl the site

Which do you prefer ?

  1. In new version of Docusaurus, we provide user two options
    A. Pages only accessible with .html extension
    B. Pages are accessible with and without .html extension

If my understanding is right the solution for this is to provide a complete sitemap that only provide pages with or without .html pages (choose only one)

So the fix is to generate Sitemap with only .html extension for option A and Sitemap without .html extension for option B

At https://docusaurus.io/sitemap.xml

JoelMarcey added a commit to JoelMarcey/docsearch-configs that referenced this pull request Jun 12, 2018
@JoelMarcey
Copy link
Contributor Author

There is no redirection from https://docusaurus.io/docs/<lang>/ to a document page like https://docusaurus.io/docs/<lang>/installation. Could it be possible to have such redirection straight away?

I created algolia/docsearch-configs#453 as a short-term fix to this.

@JoelMarcey
Copy link
Contributor Author

@s-pace I have a PR up for Docusaurus that creates a sitemap depending on whether a config option is set to allow extension-less URLs. So the sitemap will have *.html extensions if that config option is not set. If it is set, then the sitemap will not have any *.html extensions.

@yangshun
Copy link
Contributor

@JoelMarcey Is this good to merge?

@JoelMarcey
Copy link
Contributor Author

I am not sure we need this now. Or at least the whole thing. Do you see duplicate search results when you search now?

Sent with GitHawk

@yangshun
Copy link
Contributor

Actually I never did see duplicate search results before. I'm not seeing it now also. Will let you verify and close it if it's no longer needed.

@JoelMarcey
Copy link
Contributor Author

I look at it again over the weekend and close, update or merge as appropriate.

@endiliey
Copy link
Contributor

Actually the current state is that whichever our language is at, we always search for english language docs.

And whichever version we are currently, we always search for 'latest version'

https://github.com/algolia/docsearch-configs/blob/master/configs/docusaurus.json

@JoelMarcey
Copy link
Contributor Author

Right. I think latest is good. We just want the language support.

@endiliey
Copy link
Contributor

endiliey commented Jun 16, 2018

@JoelMarcey
I'll patch your PR, test & merge it later. I am back today.

Edit:
Blocked by algolia/docsearch-configs#458
DO NOT MERGE 😄

@endiliey endiliey added the status: don't merge yet This PR is completed, but we are still waiting for other things to be ready. label Jun 16, 2018
@endiliey endiliey changed the title Remove duplicate search results and add languages search correct language for Docusaurus docsearch Jun 16, 2018
@endiliey endiliey changed the title search correct language for Docusaurus docsearch Search correct language for Docusaurus docsearch Jun 16, 2018
@s-pace
Copy link

s-pace commented Jun 18, 2018

DocSearch PR merged:

<!-- at the end of the HEAD -->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css" />

<!-- at the end of the BODY -->
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.js"></script>
<script type="text/javascript"> docsearch({
  apiKey: '3eb9507824b8be89e7a199ecaa1a9d2c',
  indexName: 'docusaurus',
  inputSelector: '### REPLACE ME ####',
  algoliaOptions: { 'facetFilters': ["lang:$LANG", "tags:$TAGS"] },
  debug: false // Set debug to true if you want to inspect the dropdown
});
</script>
  • Add a search input in your page if you don't have any yet. Then update the inputSelector value in JS snippet to a CSS selector that targets your search input field.

  • Replace $LANG with the lang you want to search on.
    The list of possible lang is hardcoded in the config.
    So as of today you have: en, es-ES, ro, tr, zh-CN

  • Replace $TAGS with the tags you want to search on.
    The list of possible tags is hardcoded in the config.
    So as of today you have: blog, docs

@endiliey
Copy link
Contributor

Search for chinese when on chinese
chinese

Search for espanol when on espanol
espanol

@endiliey endiliey merged commit df42926 into facebook:master Jun 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Signed Facebook CLA status: don't merge yet This PR is completed, but we are still waiting for other things to be ready.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants