Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty robots.txt is reported as not valid #9975

Closed
john-bokma opened this issue Nov 17, 2019 · 5 comments
Closed

Empty robots.txt is reported as not valid #9975

john-bokma opened this issue Nov 17, 2019 · 5 comments

Comments

@john-bokma
Copy link

john-bokma commented Nov 17, 2019

When a robots.txt of 0 bytes is created, e.g. touch robots.txt it's reported as

robots.txt is not valid Lighthouse was unable to download a robots.txt file

Site example: https://plurrrr.com/

@connorjclark
Copy link
Collaborator

An empty robots.txt is equivalent to a missing one, as far as crawlers are concerned. However, it's tough to know for sure the intent behind an empty robots.txt. Perhaps it was mistakenly left empty.

I think we continue failing this case, but should have a better error message, and suggest a robots.txt that explicitly allows all crawling.

User-agent: *
Disallow:

@RakeshUP
Copy link

RakeshUP commented Feb 6, 2020

If robots.txt is not there or if the content is empty, the audit doesn't fail. The case passes due to this piece of code: https://github.com/GoogleChrome/lighthouse/blob/v5.6.0/lighthouse-core/audits/seo/robots-txt.js#L218-L223

The RobotsTxt audit failed for https://plurrrr.com/ because of the Content Security Policy which does not let fetch calls.
So this raises a question, should robots.txt be read using a fetch call, or should it be downloaded using a new tab?

Screenshot of what happened in case of https://plurrrr.com/
cc: @connorjclark, @john-bokma
Screenshot 2020-02-06 13 21 03

@InDieTasten
Copy link

It shouldn't be loaded via fetch. The robots.txt should be treated separate from CSPs. It's not content, so CSP should not apply. It's not like the browser is trying to display the contents as part of a document.

CSP only affects loading files as referenced from directly navigated files, like HTML src attributes, etc.
Search Engines will request the robots.txt straight away, without even knowing about a CSP. As CSPs are evaluated only on the client-side, the current implementation respecting CSPs after the fact makes no sense. It does not model the real world.

The robots.txt call must be made separate from the page, like using a new tab for it.

The title for this issue should be renamed.

@patrickhulce
Copy link
Collaborator

This is actually the same root issue as #4386 which is much broader and applies to many areas of Lighthouse. We'll de-dupe into there.

@ashishmondal30
Copy link

ashishmondal30 commented Nov 4, 2021

I'm also facing this problem. Is this problem can be an obstacler to crawling and index? Even, my website speed is 95up for both. My website- Best Tech Club

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants