Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is there a way to use different proxies for every instances? #36

Closed
Tracked by #8
lxwang42 opened this issue Oct 14, 2018 · 7 comments
Closed
Tracked by #8

is there a way to use different proxies for every instances? #36

lxwang42 opened this issue Oct 14, 2018 · 7 comments
Labels
question Further information is requested

Comments

@lxwang42
Copy link

some websites have ip blocking measures, i have to change ip for each instance after a while, is there a way i can use puppeteer-cluster to do that without start a new cluster?

@imfht
Copy link

imfht commented Oct 23, 2018

I will suggest you handle proxy problem out of headless chrome.
build a proxy-switch-server and set you proxy server to proxy-switch-server.
It looks like
default

@thomasdondorf
Copy link
Owner

Yes, perfect answer given by @imfht (thanks!). This is currently not build into this library.

Ether do it with a custom proxy server or develop a solution without puppeteer-cluster.

@thomasdondorf thomasdondorf added the question Further information is requested label Oct 23, 2018
@lxwang42
Copy link
Author

@imfht Thank you for your suggestion, this load balancing strategy is really good for scrapping, yet i want each browser instance stick to one specific socks5 proxy(like tor) so sessions wont get blocked for swapping IP. But I guess puppeteer-cluster is not designed to do this kind of task?

@thomasdondorf
Copy link
Owner

That is currently not supported, but I could modify the library to support your use case (as it is trivial to implement).

What you need to do is copy one of the browser concurrency files and implement a custom launch function:
https://github.com/thomasdondorf/puppeteer-cluster/blob/master/src/browser/ConcurrencyBrowser.ts#L15

If that works for you, I can change the current implementation to support custom concurrency implementations.

@thomasdondorf
Copy link
Owner

I added a way in version 0.12 to provide any object as concurrency option. That way, this could be implemented very easy.

The concurrency implementation can be changed to call the launch function with custom arguments (like proxy) for each worker.

Documentation is coming soon.

@thomasdondorf
Copy link
Owner

Added tests, fixed a bug and added documentation. This feature is now ready to use. Just copy one of the concurreny implementations (like Browser) and provide the class to the concurrency option.

@cyxou
Copy link

cyxou commented Dec 18, 2018

That is currently not supported, but I could modify the library to support your use case (as it is trivial to implement).

What you need to do is copy one of the browser concurrency files and implement a custom launch function:
https://github.com/thomasdondorf/puppeteer-cluster/blob/master/src/browser/ConcurrencyBrowser.ts#L15

If that works for you, I can change the current implementation to support custom concurrency implementations.

In my case, I don't know all the proxies IP addresses at cluster instantiation. Proxy IPs and corresponding credentials are obtained from a database and once fetched, a cluster worker needs to be created with these proxy credentials and currently, there is no way to pass any arguments to custom ClusterImplementation on queuing a worker task. Any thoughts on this use case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants