-
-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved handling of ignore_user_agents #3238
Conversation
Modified the code to skip request logging for all user agents that contain any substring of an ignored agent in the ignore_user_agents config. The previous code only matched the exact string of user_agent with the ignore_user_agents config using in_array function, which resulted in some requests not being skipped from logging. Now, the code uses stristr function to match the string partially and case-insensitive, which improves the functionality of ignore_user_agents.
We could also improve the config by including a new bot and optimizing the strings to ignore the bot versions and other information. app/code/core/Mage/Log/etc/config.xml
|
add more bots strings
Do you think it would make sense to add a 4° option 'Logged In Only' to activate logs only for logged-in customers? |
I got a bit confused, sorry. I did some tests and found that stripos is slightly more performant than preg_match. Now, it seems OK |
Let’s go, thanks @empiricompany |
Can you please share test results? With my tests I thinks its okay (but i still would move that to a new method to reduce complexity in |
See an updated list here https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker. In addition, the number is much higher. Besides an issue of decreased performance, the advantages of this PR must be carefully analyzed. Everyone should keep their own list and keep it as short as possible. I use .htaccess not to ignore bots but to block them. |
Besides having read on various forums that it's more performant, I did some simple tests with microtime. I didn't have time to do more complex tests on resource usage. The result of the tests is a difference of just a few microseconds on a large array taken from here 2023-05-13T09:07:38+00:00 DEBUG (7): Benchmark preg_match 0.0002 The code with foreach also seems less complex and clearer |
I also use Nginx to block bad bots at the source, but I don't want to block Googlebot. I don't think there are performance differences simply by changing the comparison method. This was supposed to be a very simple PR to prevent exclusion if the version changed in the user_agent string for googlebot, like Googlebot/2.1 change to Googlebot/2.0. I wanted to include the "compatible;" keyword in the exclusion list or by default in the method, which I think is only used by bots?. However, I left this decision to the discretion of the developer. If we want to develop a more complex mechanism, I'm open to other ideas (for example, we could save the check in session for subsequent requests). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel think PR could be merged as it is, I like it.
probably it could be improved in the future (like everything) but I don't see any point that would prevent us from implementing it :-) |
Modified the code to skip request logging for all user agents that contain any substring of an ignored agent in the ignore_user_agents config. The previous code only matched the exact string of user_agent with the ignore_user_agents config using in_array function, which resulted in some requests not being skipped from logging. Now, the code uses stristr function to match the string partially and case-insensitive, which improves the functionality of ignore_user_agents.