-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
robots.txt missing Disallow #1487
Comments
@NJAldwin the I've tested using several validators, and some don't complain about the lack of |
The addition of `Disallow:` is made in order to be compliant with: * the `robots.txt` specification (http://www.robotstxt.org/), which specifies that: "At least one Disallow field needs to be present in a record" * what is suggested in the documentation of most of the major search engines, e.g.: - Baidu: http://www.baidu.com/search/robots_english.html - Google: https://developers.google.com/webmasters/control-crawl-index/docs/getting_started http://www.youtube.com/watch?v=P7GY1fE5JQQ - Yandex: help.yandex.com/webmaster/controlling-robot/robots-txt.xml Besides the addition specified above, this commit also adds a comment making it clear to everyone that the directives from the `robots.txt` file allow all content on the site to be crawled. Ref h5bp/html5-boilerplate#1487.
The addition of `Disallow:` is made in order to be compliant with: * the `robots.txt` specification (http://www.robotstxt.org/), which specifies that: "At least one Disallow field needs to be present in a record" * what is suggested in the documentation of most of the major search engines, e.g.: - Baidu: http://www.baidu.com/search/robots_english.html - Google: https://developers.google.com/webmasters/control-crawl-index/docs/getting_started http://www.youtube.com/watch?v=P7GY1fE5JQQ - Yandex: help.yandex.com/webmaster/controlling-robot/robots-txt.xml Besides the addition specified above, this commit also: * adds a comment making it clear to everyone that the directives from the `robots.txt` file allow all content on the site to be crawled * updates the URL to `www.robotstxt.org`, as `robotstxt.org` doesn't quite work: curl -LsS robotstxt.org curl: (7) Failed connect to robotstxt.org:80; Operation timed out Close h5bp#1487.
The addition of `Disallow:` is made in order to be compliant with: * the `robots.txt` specification (http://www.robotstxt.org/), which specifies that: "At least one Disallow field needs to be present in a record" * what is suggested in the documentation of most of the major search engines, e.g.: - Baidu: http://www.baidu.com/search/robots_english.html - Google: https://developers.google.com/webmasters/control-crawl-index/docs/getting_started http://www.youtube.com/watch?v=P7GY1fE5JQQ - Yandex: help.yandex.com/webmaster/controlling-robot/robots-txt.xml Besides the addition specified above, this commit also: * adds a comment making it clear to everyone that the directives from the `robots.txt` file allow all content on the site to be crawled * updates the URL to `www.robotstxt.org`, as `robotstxt.org` doesn't quite work: curl -LsS robotstxt.org curl: (7) Failed connect to robotstxt.org:80; Operation timed out Close #1487.
According to the robots.txt standard: " The record starts with one or more User-agent lines, followed by one or more Disallow lines, as detailed below. " This is also encouraged by the major search engines. - Baidu: http://www.baidu.com/search/robots_english.html - Google: https://developers.google.com/webmasters/control-crawl-index/docs/getting_started http://www.youtube.com/watch?v=P7GY1fE5JQQ - Yandex: help.yandex.com/webmaster/controlling-robot/robots-txt.xml Ref yeoman/generator-webapp#220 h5bp/html5-boilerplate#1487 http://www.robotstxt.org/orig.html
Looking more closely at the robots.txt standard, it states that "At least one Disallow field needs to be present in a record."
The current robots.txt creates a record with
User-agent: *
but omits the Disallow field. As far as I can understand, the rationale behind this is either:Disallow:
line should be added to make the file valid.Thoughts?
The text was updated successfully, but these errors were encountered: