Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build URLs using urljoin #1217

Closed
wants to merge 1 commit into from
Closed

Build URLs using urljoin #1217

wants to merge 1 commit into from

Conversation

cgoldberg
Copy link
Member

@cgoldberg cgoldberg commented Jan 6, 2020

This PR makes a small change to how URL's are created when a Host is specified. Previously, the URL was built by using string concatenation. This changes it to use urljoin.

This fixes the edge case where a host is entered containing a trailing slash (i.e. https://example.com/ rather than https://example.com). Once merged, it should work with or without trailing a slash.

supersedes #1216

@cgoldberg
Copy link
Member Author

arg.. CI is failing with this change.. need to investigate when I have a few mins.

@TBBle
Copy link
Contributor

TBBle commented Jan 7, 2020

The second test_wrong_url case failed, because

assert urljoin("telnet://127.0.0.1", "/") == '/'

instead of

assert "%s%s" % ("telnet://127.0.0.1", "/") == 'telnet://127.0.0.1/'

which changed the failure exception from requests.

This is because urljoin knows you can't put a path on a telnet: URL, and since it's job is to fill in missing bits on the second parameter by taking from the first, it does that by taking nothing from the first.

I suspect urljoin might be the wrong thing to use here, unless we can validate the input earlier so that we get a meaningful exception with the input value, rather than a complaint about a / that was not a parameter provided by the user/caller.

Also, I'm suspicious that this is doing the wrong thing anyway, as if the base_url contains a path and does not end in a /, it now loses the last path element, again due to how urljoin is defined.

>>> urljoin("http://127.0.0.1/root/path/", "extra")
'http://127.0.0.1/root/path/extra'
>>> urljoin("http://127.0.0.1/root/path", "extra")
'http://127.0.0.1/root/extra'

I suspect this change will break examples/browse_docs_test.py, for example, due to how urljoin handles absolute paths on the second parameter.

>>> urljoin("https://docs.locust.io/en/latest/", "/")
'https://docs.locust.io/'

It might be better to just use urlsplit, modify the path carefully, and urlunsplit perhaps. That would also let the base_url contain query parameters, if that was a sensible thing to do...

@cyberw
Copy link
Collaborator

cyberw commented Jan 15, 2020

@cgoldberg Looks nice! can you also fix it for FastHttpLocust at the same time?

@vstepanov-lohika-tix
Copy link
Contributor

vstepanov-lohika-tix commented Jan 29, 2020

@cgoldberg
Sorry, this change will break a lot of existing locust tests and shouldn't be merged,
It was already in master and was reverted due to critical issue that was detected soon after that.
#1134
And was reverted in PR:
#1148

Please see my comment in this PR:
Unfortunately, I've found a big disadvantage of this change:
urljoin from urllib has a surprising behavior: it cuts everything in base url after "/", if the second part of url to join starts from "/".
For example, urljoin("http://mysite.com/api/v0", "/path/to/join")" returns a result: http://mysite.com/path/to/join`

@heyman
Copy link
Member

heyman commented Jan 29, 2020

I agree that we cannot merge this as is.

I think that at some point we should deprecate host and rename it to base_url. At the same time we could change the behaviour.

@vstepanov-lohika-tix
Copy link
Contributor

I suggest to add ToDo above this code for a future with note something like: "Issue with urljoin, requires refactoring"

@TBBle
Copy link
Contributor

TBBle commented Jan 30, 2020

urljoin is in-effect a relative URL resolver. So it only makes sense to use in those situations, which isn't (from my own reading) how this code is documented to work. I would not often expect to call an API starting with .. or even without /.

The 'prepending' behaviour documented for this code is much-more similar to how OpenAPI uses "base url", to resolve named APIs, and is probably therefore a less-surprising result.

In our case, base_url encapsulates "Some mod_rewrite rule on the firewall, applied to the front of the path in the API documentation".

I would normally write any base_url I was discussing to end with / (because we're going to append to it, and a root URL ends in /) and an API I was discussing start with a / (because it's the base of the API namespace), so the need to not include a // at the resulting join-point is real, I agree. Just that urljoin is not the right behaviour for that join-point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants