Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make WORKER_INTERVAL_MS a configurable parameter #44

Open
ThomasProctor opened this issue Jun 5, 2020 · 2 comments
Open

Make WORKER_INTERVAL_MS a configurable parameter #44

ThomasProctor opened this issue Jun 5, 2020 · 2 comments
Labels
waiting for response waiting for issue owner response

Comments

@ThomasProctor
Copy link
Contributor

This is something I've found useful for throttling my scrapes to avoid being banned. It just requires 3 changed lines of code.

@leonardiwagner
Copy link
Member

throttling my scrapes to avoid being banned

Could you explain how much of time prevents you from being banned? Thank you!

@leonardiwagner leonardiwagner added the waiting for response waiting for issue owner response label Jun 9, 2020
@ThomasProctor
Copy link
Contributor Author

Right now, I know that the default - 1000 ms - has gotten me banned after scraping around 200 profiles. I've been able to safely scrape with a value much higher than that - 100000 ms or so.

I've been basing my experience off numbers I've gotten from PhantomBuster, which seem to be incredibly conservative. I'm still working on finding exactly what level I can get away with. I'm assuming it depends on other behavior as well too, so it will take some experimentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for response waiting for issue owner response
Projects
None yet
Development

No branches or pull requests

2 participants