-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need a flag to deactivate robots.txt validation #508
Comments
Changing the user-agent name when robots.txt is disabled is interesting, as it would fix the blacklist problem. Let's see what happens! |
please do fork/provide a patch, etc. |
nevermind, i did it myself: but next time, keep in mind that it's useful for everyone else out there if you do provide a patch, as we share the work! :) |
This issue as well as #556 can be closed as the feature is already in a released version. |
@peterhoeg , if you have a chance would you mind pointing me to where this feature is located? It's not obvious to me from the docs or |
Hmm, I see the |
Ahh it appears I need version 9.4 of the new Github repo. More info here: linkchecker/linkchecker#4 Thanks for maintaining this! |
This bug is related to #127 (and presumably others).
Ok, I understand you don't want your tool to be blacklisted (?) by some web admins.
However, from a useability point of view, this is a real hassle. Consider this: I am a system administrator and I am trying to write an automated tool to check for dead links on a huge bunch of websites, some belonging to customers of mine, some of friends, some of personal websites (yeah it's a big project).
If I need to go to each website and add a LinkChecker exception rule to their robots.txt file, that will take me a few days (contacting whoever is in charge, getting FTP passwords and authorizations to modify the files, etc.).
If I fork your project and add that option myself, that will take me ~15 minutes.
After that, I can write my cron script on my computer, using the said flag, and that would take me ~3 seconds.
Time gain: a few days, during which I'll be able to drink beer with my friends instead of doing some menial administrative work.
Btw your blacklisting argument is moot, IMHO. First you could add such an option without risking blacklisting. For example, we could imagine that the --dontcare-robots flag changes the user-agent string of linkchecker into, say, "BadBoyLinkChecker/x.y", so it would allow sysadmins to blacklist it if necessary, without blacklisting the "normal" mode.
Secundo, this is the world wide web we're talking about. Maybe some other web spiders/crawlers are already spoofing your user-agent string for malicious purposes. And if that has not happened yet, you can be sure it will anyway. Better get used to to the idea... Some sysadmins will blacklist linkchecker, there's nothing you can do. But for those of us who want to use it anyway, we have a legitimate need of the flag I describe.
It is quite an easy change to do, and it would be so useful that I am sure I am not the only one considering starting an official fork just for adding that feature's sake. Think about it: it's an open-source project! Forking is easy... But then I'd need to sync it regularly with whatever changes you will do. It would be so much better for everyone to convince you to add this option yourself :-). Less effort, less time wasted for everyone, more power to the users, who are adults and do not need some software to artificially restrict their freedom, which they usually use responsibly.
Anyway, thanks for this great piece of software :-). Looking forward to seeing this solved.
The text was updated successfully, but these errors were encountered: