Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HTTP_PROXY and HTTPS_PROXY from env, if set #382

Merged
merged 3 commits into from
Apr 7, 2022
Merged

Conversation

tomas
Copy link
Owner

@tomas tomas commented Nov 2, 2021

Should fix #336 336

@nejch
Copy link

nejch commented Nov 5, 2021

Hi @tomas thanks for this, I just found this and it'd be great for CI checks using https://github.com/tcort/markdown-link-check / https://github.com/tcort/link-check!

Do you think you could add lowercase variables and no_proxy as well?

For some more context, see:
https://about.gitlab.com/blog/2021/01/27/we-need-to-talk-no-proxy/
https://superuser.com/questions/944958/are-http-proxy-https-proxy-and-no-proxy-environment-variables-standard

@tomas
Copy link
Owner Author

tomas commented Dec 1, 2021

Ok just did a quick implementation for the NO_PROXY check. Havent tried it out though. :)

@sgoff0
Copy link

sgoff0 commented Dec 1, 2021

Ok just did a quick implementation for the NO_PROXY check. Havent tried it out though. :)

@tomas I just tested the code snippet you added and this did not solve the problem for my use case. I'll have time to poke around a bit in the next hour or so and provide some feedback.

@sgoff0
Copy link

sgoff0 commented Dec 1, 2021

Here is an example of what worked for me (functions taken from request library) master...sgoff0:sgoff0/no_proxy

The big problem I ran into with your implementation is that NO_PROXY value of 127.0.0.1 doesn't match anything in real life because your logic wants the port (a.host == b.host) to perfectly match. If you were to continue down this path instead of using what the request library does it probably should match on hostname if no port is provided in NO_PROXY
image

Also it's important in my scenario that config.proxy is set to null if there is a match on NO_PROXY, just skipping existing config.proxy conditional logic wasn't enough.

@nejch
Copy link

nejch commented Jan 21, 2022

Thanks for trying this out as well @sgoff0!

@tomas thanks also for adding no_proxy. Is there anything else we can do to help push this along?

@schackito
Copy link

Hello, what's the status of this? Will you merge this soon?

@tomas
Copy link
Owner Author

tomas commented Apr 7, 2022

Looking at this now.

Busy weeks lately, sorry for the wait.

@tomas tomas merged commit d93c442 into master Apr 7, 2022
@tomas tomas deleted the env-http-proxy branch April 7, 2022 13:39
@schackito
Copy link

schackito commented Apr 7, 2022

Looking at this now.

Busy weeks lately, sorry for the wait.

Thanks a lot!

@lyswhut
Copy link

lyswhut commented May 27, 2022

We should state in the documentation that it will automatically read HTTP_PROXY, HTTPS_PROXY, etc. of environment variables,

Because not in all cases you want it to automatically apply the proxy settings in the environment variables.

Before this I didn't know it would automatically apply the proxy settings of the environment variable, and later found this problem after troubleshooting, and now I pass delete process.env.HTTP_PROXY to stop it from automatically applying the proxy

@nejch
Copy link

nejch commented May 27, 2022

That makes sense, though it's fairly standard behavior for most web clients to accept those environment variables by default, and the proper way to disable it is not to delete them but to use the NO_PROXY variable (though I understand this is not properly parsed here, I might try to contribute a fix).

@lyswhut
Copy link

lyswhut commented May 27, 2022

Now in my case I don't want to let all requests go through the proxy,
I configured HTTP_PROXY to be used by other programs, so removing them has no effect for me,
But I suggest that it would be better to provide a configuration to enable them, and it should be off by default, because node.js requests by default do not read proxy settings in environment variables

@tomas
Copy link
Owner Author

tomas commented May 27, 2022

@nejch It should be properly parsed. What do you mean exactly?

I'm also inclined to let the user use the NO_PROXY env variable if you want to skip proxying to a specific host declared in HTTP_PROXY, but I guess there might be a case where you want to instruct Needle to skip detection of env vars.

@nejch
Copy link

nejch commented May 27, 2022

@lyswhut I understand and this is a fairly common scenario, I'm just saying the standard way to do it is to define those hostnames, domains, or IPs in NO_PROXY - you'll find more details in the links I provided above in one of my early comments.

E.g. popular clients:
https://www.oreilly.com/library/view/security-with-go/9781788627917/5ea6a02b-3d96-44b1-ad3c-6ab60fcbbe4f.xhtml
https://requests.readthedocs.io/en/latest/user/advanced/#proxies
https://www.python-httpx.org/advanced/#environment-variables

I have no strong opinion of course, just wanted to share this, obviously you might want the option to do things differently programmatically :)


@tomas thanks for the response! Currently needle does an exact URL match out of the comma-separated list provided in no_proxy. But usually no_proxy is used with domains to catch all kinds of URLs e.g. for .yourcompany.com

So for example if no_proxy is set to:

  • * wildcard: never proxy anything
  • git.example.com, exact match -> should not proxy https://git.example.com or http://git.example.com
  • .example.com -> should not proxy git.example.com, org.git.example.com on http or https, etc.
  • .git.example.com -> should not proxy subdomains of those etc etc.

Someone already pasted the request implementation earlier:
https://github.com/request/request/blob/master/lib/getProxyFromURI.js

This may be an even clearer approach from proxy-from-env:
https://github.com/Rob--W/proxy-from-env/blob/96d01f8fcfdccfb776735751132930bbf79c4a3a/index.js#L62-L106

I don't really do JS on a daily basis so maybe you'd be faster to come up with something, not sure :)

@lyswhut
Copy link

lyswhut commented May 27, 2022

@nejch I understand and this is a fairly common scenario, I'm just saying the standard way to do it is to define those hostnames, domains, or IPs in NO_PROXY - you'll find more details in the links I provided above in one of my early comments.

As you said, this is very common for an application, but Needle is just a request library, it is only one of the dependencies of the application, it is not provided to the end user,
So it should be controlled by the application that depends on it whether to use environment variables,
In fact, whether to obtain proxy information from environment variables should be done by the application that depends on it, or provide an option for the application to decide whether to enable this feature

In my experience, the behavior of reading the environment variable proxy is generally only used in CLI programs

@nejch
Copy link

nejch commented May 27, 2022

Hmm, I've had the opposite experience as it really is baked into some other popular languages' stdlibs (even libcurl respects it, and the links above were also of popular request libraries), but if this is the more idiomatic approach in node then that makes sense, I'm definitely not a JS expert here :) I have seen proxy logic get reinvented a few times in JS projects, I guess node folks had their reasons to exclude it.

@tomas the no_proxy discussion maybe belongs more in #383, sorry for the detour here :P

@tomas
Copy link
Owner Author

tomas commented May 27, 2022

No worries @nejch, the discussion is worth it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support env HTTP_PROXY, HTTPS_PROXY
5 participants