Empty robots.txt is reported as not valid #9975

john-bokma · 2019-11-17T15:58:33Z

When a robots.txt of 0 bytes is created, e.g. touch robots.txt it's reported as

robots.txt is not valid Lighthouse was unable to download a robots.txt file

The text was updated successfully, but these errors were encountered:

connorjclark · 2020-01-10T00:48:31Z

An empty robots.txt is equivalent to a missing one, as far as crawlers are concerned. However, it's tough to know for sure the intent behind an empty robots.txt. Perhaps it was mistakenly left empty.

I think we continue failing this case, but should have a better error message, and suggest a robots.txt that explicitly allows all crawling.

User-agent: *
Disallow:

RakeshUP · 2020-02-06T08:53:11Z

If robots.txt is not there or if the content is empty, the audit doesn't fail. The case passes due to this piece of code: https://github.com/GoogleChrome/lighthouse/blob/v5.6.0/lighthouse-core/audits/seo/robots-txt.js#L218-L223

The RobotsTxt audit failed for https://plurrrr.com/ because of the Content Security Policy which does not let fetch calls.
So this raises a question, should robots.txt be read using a fetch call, or should it be downloaded using a new tab?

Screenshot of what happened in case of https://plurrrr.com/
cc: @connorjclark, @john-bokma

InDieTasten · 2020-08-12T09:36:15Z

It shouldn't be loaded via fetch. The robots.txt should be treated separate from CSPs. It's not content, so CSP should not apply. It's not like the browser is trying to display the contents as part of a document.

CSP only affects loading files as referenced from directly navigated files, like HTML src attributes, etc.
Search Engines will request the robots.txt straight away, without even knowing about a CSP. As CSPs are evaluated only on the client-side, the current implementation respecting CSPs after the fact makes no sense. It does not model the real world.

The robots.txt call must be made separate from the page, like using a new tab for it.

The title for this issue should be renamed.

patrickhulce · 2020-08-12T15:47:30Z

This is actually the same root issue as #4386 which is much broader and applies to many areas of Lighthouse. We'll de-dupe into there.

ashishmondal30 · 2021-11-04T17:25:22Z

I'm also facing this problem. Is this problem can be an obstacler to crawling and index? Even, my website speed is 95up for both. My website- Best Tech Club

exterkamp added the good first issue label Jan 10, 2020

paulirish added the needs-priority label Jan 28, 2020

connorjclark added the P3 label Apr 28, 2020

devtools-bot removed the needs-priority label Apr 28, 2020

patrickhulce closed this as completed Aug 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty robots.txt is reported as not valid #9975

Empty robots.txt is reported as not valid #9975

john-bokma commented Nov 17, 2019 •

edited

Loading

connorjclark commented Jan 10, 2020

RakeshUP commented Feb 6, 2020

InDieTasten commented Aug 12, 2020

patrickhulce commented Aug 12, 2020

ashishmondal30 commented Nov 4, 2021 •

edited

Loading

Empty robots.txt is reported as not valid #9975

Empty robots.txt is reported as not valid #9975

Comments

john-bokma commented Nov 17, 2019 • edited Loading

connorjclark commented Jan 10, 2020

RakeshUP commented Feb 6, 2020

InDieTasten commented Aug 12, 2020

patrickhulce commented Aug 12, 2020

ashishmondal30 commented Nov 4, 2021 • edited Loading

john-bokma commented Nov 17, 2019 •

edited

Loading

ashishmondal30 commented Nov 4, 2021 •

edited

Loading