Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: prevent search engines to index historical versions #6517

Merged

Conversation

khsrali
Copy link
Contributor

@khsrali khsrali commented Jul 4, 2024

Fixes #6516 ,
I suggest to merge this immediately if the tests has passed, and build is successful.

The only way to know if this resolves the issue is to wait a few days and see if Google indexes are updated.

@khsrali khsrali requested a review from GeigerJ2 July 4, 2024 11:29
Comment on lines 2 to 5
Allow: /*/latest/
Allow: /en/latest/ # Fallback for bots that don't understand wildcards
Allow: /*/stable/
Allow: /en/stable/ # Fallback for bots that don't understand wildcards
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you copied this from the astro project, but are we sure this syntax is correct? The url for AiiDA's documentation on RTD starts with /projects/aiida-core/en/stable. Should these rules include the /projects/aiida-core/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right!
yes, assuming RTD will place it to the right place: https://aiida.readthedocs.io/robots.txt
The paths inside this file are relative to the site domain, so:

Suggested change
Allow: /*/latest/
Allow: /en/latest/ # Fallback for bots that don't understand wildcards
Allow: /*/stable/
Allow: /en/stable/ # Fallback for bots that don't understand wildcards
Allow: /projects/aiida-core/en/latest/
Allow: /projects/aiida-core/en/stable/

In any case, from the build, I now realize RTD places this file in a wrong place:
https://aiida.readthedocs.io/projects/aiida-core/robots.txt

This makes it discoverable by search engines...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright!
One solution is to set an exact redirect in RTD web setting tab,
from /robots.txt to /projects/aiida-core/robots.txt

@sphuber I don't have access to RTD settings, may I ask you for this? if you agree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sphuber I don't have access to RTD settings, may I ask you for this? if you agree.

I have gone through the admin panel, but cannot find a way to customize the path. It says in the docs that ReadTheDocs automatically generates and serves. We can set older versions to hidden so that their paths are automatically included in the robots.txt to be excluded from indexing. Would that be a better approach as we are sure that the robots.txt is put in the correct place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, hiding versions might not be ideal because some users may actually want to look at older version for some reason. We just don't want them to be indexed. And never mind, the docs also say what should be done for projects using Sphinx as we are, and it seems your approach is correct.

@khsrali khsrali force-pushed the google-donot-index-500-years-ago-please branch from df43725 to 0a6e426 Compare July 5, 2024 05:50
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @khsrali I have validated the robots.txt with this tool https://technicalseo.com/tools/robots-txt/ and it seems to be correct. So let's merge this and see howm it goes.

@sphuber sphuber merged commit 5c1f5d6 into aiidateam:main Jul 5, 2024
4 checks passed
@khsrali khsrali deleted the google-donot-index-500-years-ago-please branch July 9, 2024 14:44
sphuber pushed a commit that referenced this pull request Aug 7, 2024
#6517)

Currently, all versions of the documentation are indexed with the result
that google searches come up with very outdated versions and the latest
version is almost impossible to find. The `robots.txt` now disallows
any path from being indexed except for the `latest` and `stable`
versions of the documentation.

Cherry-pick: 5c1f5d6
mikibonacci pushed a commit to mikibonacci/aiida-core that referenced this pull request Sep 3, 2024
aiidateam#6517)

Currently, all versions of the documentation are indexed with the result
that google searches come up with very outdated versions and the latest
version is almost impossible to find. The `robots.txt` now disallows
any path from being indexed except for the `latest` and `stable`
versions of the documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docs: how to stop search engines indexing outdated version of aiida
2 participants