Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spiders catch-all isn't catching spiders with longer user agent strings #590

Open
jrobinson-rdm opened this issue Jun 21, 2024 · 0 comments

Comments

@jrobinson-rdm
Copy link

I would expect the following user agents to be matched by the spiders catch all.

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.2210.133 VirusTotalBot
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0;  https://openai.com/bot

I'm not a regex expert, but I suspect that they aren't because with each one, the match (Bot, facebookexternalhit, and bot respectively) doesn't occur within the first 100 characters. Would it make sense to check the first 200 characters for the spiders catch all, similar to the general matcher for bots?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@jrobinson-rdm and others