Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP POST and PUT not working since data rewind is not supported #199

Closed
genotrance opened this issue Dec 4, 2023 · 1 comment
Closed

Comments

@genotrance
Copy link
Owner

If an HTTP client connected to Px wants to perform an HTTP POST or PUT (not HTTPS/CONNECT), libcurl connects to the upstream proxy and makes the request without sending a Proxy-Authorization header since it doesn't know auth is needed. Px does know of course, not sure how to tell libcurl to send the header - would be nice to save that round trip, but that's not the main issue here.

Curl info: Hostname xyz.com was found in DNS cache
Curl info: Trying 1.2.3.4:80...
Curl info: Connected to xyz.com (1.2.3.4) port 80
Sent header => PUT http://httpbin.org/put?param1=val1 HTTP/1.1
Sent header => Host: httpbin.org
Sent header => User-Agent: mcurl v0.9.0
Sent header => Proxy-Connection: Keep-Alive
Sent header => Accept: */*
Sent header => Content-Length: 36
Read 36 bytes
Read 0 bytes

libcurl also uploads the data as expected for that POST/PUT as can be seen above (36 bytes worth). However, the upstream proxy wants Px to authenticate so it sends back a 407 and the Proxy-Authenticate header.

Received header <= HTTP/1.1 407 authenticationrequired
Suppressing headers
Received header <= Date: Fri, 01 Dec 2023 17:39:51 GMT
Received header <= Content-Type: text/html
Received header <= Cache-Control: no-cache
Received header <= Content-Length: 4237
Received header <= X-Frame-Options: deny
Received header <= Proxy-Connection: Keep-Alive
Received header <= Proxy-Authenticate: NTLM
Received header <= Proxy-Authenticate: Basic realm="McAfee Web sanitized len(8)
Resuming headers
Curl info: Rewind stream before next send                             <=== REWIND expected
Curl info: Keep sending data to get tossed away
Curl info: Ignoring the response-body
Read 0 bytes
Curl info: Connection #5 to host xyz.com left intact

libcurl understands it needs to authenticate and also clearly tells us that since it now needs to retry the PUT, it needs to resend that data so expects the client (Px in this case) to implement CURLOPT_SEEKFUNCTION.

Curl info: Issue another request to this URL: 'http://httpbin.org/put?param1=val1'
Curl info: Found bundle for host: 0x2028b35b700 [serially]
Curl info: Re-using existing connection with proxy xyz.com
Curl info: necessary data rewind wasn't possible                      <=== Not implemented, so out of luck
Curl info: Closing connection

Now the problem is that Px simply pipes the POST/PUT data from the client socket directly to the upstream proxy server. Once that is sent, there's no way for Px to go back to the client and ask it to resend the data. We do not want to send back the client a 407 since Px is doing the authentication on behalf of the client. If we send a 407, the client won't have any credentials to send and error out. Nothing that I'm aware of in the HTTP spec allows you to tell the client to resend data, besides the 407.

Now I also do not want to cache the client data since it could be a large amount of data. And considering Px supports multiple clients across processes/threads, it's probably not the most comforting idea. What's worse is that even if we do that and retry with a Proxy, once we do the Proxy-Authorization, the proxy will then challenge Px after which it will have to resend the data a third time. I somehow feel this is wrong and wonder if there's a way this is handled more efficiently by proxies. I don't see an obvious way within libcurl to improve this situation but am hoping the libcurl community has a suggestion to solve this. No doubt, HTTP POST/PUT are not common this day and age where everything needs to be secure but this still needs a sensible solution. Uploading 3 times seems ridiculous.

Somehow this worked in my testing all this time since my upstream proxy was not challenging the POST/PUT after the initial authenticated GET and CONNECT in my test script. This seems to have changed - likely the upstream proxy was updated to improve security. Nonetheless, that was pure luck and this is broken and further, it disables Px altogether since it treats libcurl error 65 as a proxy auth failure and stops any further queries to avoid locking out the account.

So basically three issues in order of importance:

  • Handle the rewind and re-uploading of data - ideally without caching within Px
  • Handle the libcurl error 65 correctly and not disable the upstream proxy altogether
  • Reduce the number of round trips by forcing libcurl to send a Proxy-Authorization header in the first round

Hoping the libcurl community has some ideas to solve this.

@genotrance genotrance added the bug label Dec 4, 2023
@genotrance
Copy link
Owner Author

Per discussion with @bagder, this issue occurs when libcurl is configured with ANY as the auth mechanism (this is the default for Px and users can change it with --auth). This prompts libcurl to discover the auth mechanism which is why it does not send the Proxy-Authorization header initially. Specifying NTLM or NEGOTIATE avoids this problem. But most users don't know or care what the auth mechanism for their proxy is so ANY is a reasonable default.

Meanwhile, ever since Px started leveraging libcurl, it was no longer possible to find out which auth mechanism was discovered since no such libcurl API exists today. Previously, Px did the discovery itself and could cache that info. Once the first transaction with the upstream proxy was completed, we could just reuse that info. It saved the extra back and forth and resulted in overall faster performance. An issue will be opened against libcurl to add this functionality in a future version - @bagder was receptive to the idea.

Until then, and considering Px has to work with many older versions of libcurl, the auth mechanism will need to be deciphered from the headers returned by the upstream proxy as part of DEBUGFUNCTION. Today it is only being logged when --debug is enabled. This will not only reduce the likelihood of this issue but will also improve performance.

While this does not fundamentally fix this issue, it seems like a good enough mitigation since the first transaction to an upstream proxy is more likely to be a GET rather than a POST/PUT and that initial call can cache the auth mechanism. In fact, CONNECT is far more likely since most sites use HTTPS anyway so the general likelihood of this issue does not warrant a full scale caching of POST/PUT data and implementing support for rewind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant