Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in set_scheme for 2.1.1 vs 2.1.0 #577

Open
alexcrichton opened this issue Jan 9, 2020 · 23 comments
Open

Regression in set_scheme for 2.1.1 vs 2.1.0 #577

alexcrichton opened this issue Jan 9, 2020 · 23 comments

Comments

@alexcrichton
Copy link
Contributor

Cargo's internal test suite is unfortunately failing after the update to url 2.1.1, and one thing we've narrowed down so far is that the behavior of this changed between 2.1.0 and 2.1.1:

use url::Url;

fn main() {
    let mut url = Url::parse("git://github.com/foo/bar").unwrap();
    println!("{:?}", url.set_scheme("https"));
}

On 2.1.0 this succeeded but on 2.1.1 this is now failing.

Is this an intended change or perhaps an accidental bug? If it's intended, is there a way we can get this working?

@alexcrichton
Copy link
Contributor Author

Er sorry, and to be clear, it's set_scheme that's changing behavior, parse works on both versions.

@jdm
Copy link
Member

jdm commented Jan 9, 2020

This looks like a deliberate change in 7efdc53.

@alexcrichton
Copy link
Contributor Author

In the case that this is deliberate, is there perhaps a workaround that we can implement?

@jdm
Copy link
Member

jdm commented Jan 9, 2020

let url_str = "git://github.com/foo/bar";
let git_url = Url::parse("git://github.com/foo/bar").unwrap();
let https_url = Url::parse(url_str.replace("git", "https")).unwrap();

Would that work for you?

@SimonSapin
Copy link
Member

If you want to go with string manipulation, consider something like ["https", &git_url[url::Position::AfterScheme..]].join() in case the substring https occurs in some other part of the URL.

@tmccombs
Copy link
Contributor

  1. What was the motivation for this change?
  2. Should such a breaking change be part of a patch release?
  3. The documentation only says set_scheme fails if:
    • The new scheme is not in [a-zA-Z][a-zA-Z0-9+.-]+
    • This URL is cannot-be-a-base and the new scheme is one of http, https, ws, wss, ftp, or gopher
      And I don't think either is the case here. If this is intended behavior should the documentation be updated?

alexcrichton added a commit to alexcrichton/cargo that referenced this issue Jan 10, 2020
bors added a commit to rust-lang/cargo that referenced this issue Jan 10, 2020
Fix tests with `url` crate update

Works around servo/rust-url#577
@valenting
Copy link
Collaborator

1. What was the motivation for this change?

Getting all the URL web-platform tests to pass

2. Should such a breaking change be part of a patch release?

In retrospect it would have been probably better to be in a dot release, but it wasn't immediately obvious at the time that we were breaking a common use case.

3. The documentation only says set_scheme fails if:
     And I don't think either is the case here. If this is intended behavior should the documentation be updated?

Yes, we should update the docs to reflect all of the cases in https://url.spec.whatwg.org/#scheme-state

@dekellum
Copy link
Contributor

So yank 2.1.1 and release 2.2.0, already.

@SimonSapin
Copy link
Member

What would that achieve? Cargo treats them the same.

@dekellum
Copy link
Contributor

No it doesn't. And if it did, SemVer is still worth following.

@SimonSapin
Copy link
Member

https://semver.org/#spec-item-8

Major version X (X.y.z | X > 0) MUST be incremented if any backwards incompatible changes are introduced to the public API.

Does a “change […] to the public API” also include behaviors, or only signatures?

For a library documented as implementing a living standard (which doesn’t have version numbers and changes over time), should changing the behavior to catch up with a specification change be “considered backwards-incompatible”?

@dekellum
Copy link
Contributor

I think 2.2.0 is acceptable for these changes to the extent that I understand them (and also at least potentially meaningful to users, or at least those that are free thinking and inclined to use constraints such as url = ">=2.1.0, < 2.2" or url = "~2.1.0").

But 3.0.0 is of course fine as well, if you feel its necessary/warranted. Thanks!

@SimonSapin
Copy link
Member

Sorry I wasn’t clear. I’m saying I think we shouldn’t consider catching up with the spec as backwards-incompatible, and therefore SemVer still applies to API signatures in this crate.

And yes, when I said Cargo treats 2.2.0 and 2.1.0 the same I meant for users that use the default caret dependedencies.

@dekellum
Copy link
Contributor

dekellum commented Jan 13, 2020

Your argument seems strange and conflicting to me: The spec is living so we can't have mature release management of an implementation of the spec?

I agree with @tmccombs above. Intuitively this was a pretty large set of changes and I think its atleast in the spirit of bumping MINOR, for the same reason that Semver requires doing this for new features.

@tmccombs
Copy link
Contributor

Does a “change […] to the public API” also include behaviors, or only signatures?

My interpretation is that a change in behavior of a public function that breaks existing applications is a "backwards incompatible change to the public API". Unfortunately, the semver spec is not particularly precise on what "backwards incompatible" means, and any bug fix changes behavior, almost by definition, so clearly you can't say all behavior changes require a new major release. On the other hand, this change did clearly break at least one application.

I’m saying I think we shouldn’t consider catching up with the spec as backwards-incompatible, and therefore SemVer still applies to API signatures in this crate.

Even if the living spec has backwards incompatible changes?

@bstrie
Copy link

bstrie commented Feb 5, 2020

  1. What was the motivation for this change?
    Getting all the URL web-platform tests to pass

@valenting , can you elaborate? Is it the intention of this crate that it should be impossible to use set_scheme to convert between special and non-special URLs? As noted the documentation for set_scheme says:

Do nothing and return Err if:

  • The new scheme is not in [a-zA-Z][a-zA-Z0-9+.-]+
  • This URL is cannot-be-a-base and the new scheme is one of http, https, ws, wss, ftp, or gopher

The behavior as currently implemented suggests the following additional bullet point: "Exactly one of the old or new schemes is one of http, https, ws, wss, ftp, or gopher". But is it really the intention of the platform tests that interconverting between special and non-special schemes should be prohibited?

EDIT: I see that the URL spec says "If url’s scheme is a special scheme and buffer is not a special scheme, then return."/"If url’s scheme is not a special scheme and buffer is a special scheme, then return.", although it's a bit frustratingly short on why this has to be. If we're certain that this is the correct behavior then I'll come up with a docs PR.

@bstrie
Copy link

bstrie commented Feb 5, 2020

I've submitted a docs PR at #585 to elaborate on the error conditions being exhibited here.

I can also confirm that the upgrade from 2.1.0 to 2.1.1 exhibited breakage in our application as well, which I only discovered recently by wondering why one of our Cargo.tomls had url = "=2.1.0". I tend to agree that this sort of thing is something that would ordinarily be encapsulated by a major 3.0.0 release (which doesn't have to be a massive ecosystem-breaking change, with the proper application of the semver trick (in fact, I think these sorts of just-behavioral, not-type-level breaking changes are the ideal application of the semver trick)).

@tmccombs
Copy link
Contributor

tmccombs commented Feb 6, 2020

although it's a bit frustratingly short on why this has to be. If we're certain that this is the correct behavior...

my guess is it probably has something to with the fact that WHATWG is primarily focused on browsers and the "special" schemes are especially special in the context of browsers.

I wonder if maybe there should be a more lenient API for urls (maybe a separate crate possibly that rust-url uses) where methods like setScheme infallibly set the appropriate component, and parsing accepts weird cases like that in #581.

@bstrie
Copy link

bstrie commented Feb 6, 2020

As annoying as this is to workaround, I'd be a little worried about forging ahead with defining a "lenient mode" without explicitly defining the extent of the deviation from the spec. At minimum we might consider some API that has no notion whatsoever of "special" schemes (whose existence the spec frustratingly neglects to justify), though without knowing why the spec makes this distinction I can't begin to determine if that's a good idea, and that also wouldn't help #581

Deluvi added a commit to sonos/dinghy that referenced this issue Feb 13, 2020
A regression in the `url` patch 2.1.1 caused cargo to panic when
compiling crates with git ssh dependencies. This is described in
servo/rust-url#577. A workaround has been deployed in cargo 0.43, but
the version is not yet released. On the meantime, dinghy should use the
url version 2.1.0.
kali pushed a commit to sonos/dinghy that referenced this issue Feb 13, 2020
A regression in the `url` patch 2.1.1 caused cargo to panic when
compiling crates with git ssh dependencies. This is described in
servo/rust-url#577. A workaround has been deployed in cargo 0.43, but
the version is not yet released. On the meantime, dinghy should use the
url version 2.1.0.
@alercah
Copy link

alercah commented Apr 25, 2020

Unfortunately, the semver spec is not particularly precise on what "backwards incompatible" means, and any bug fix changes behavior, almost by definition, so clearly you can't say all behavior changes require a new major release.

In the Rust ecosystem, RFC 1105 is basically canonical, and something based on it (perhaps brought up to date a bit) is planned for the Cargo book.

Semver does have some other applicable guidance.

I wonder if maybe there should be a more lenient API for urls (maybe a separate crate possibly that rust-url uses) where methods like setScheme infallibly set the appropriate component, and parsing accepts weird cases like that in #581.

I just wandered into this madness today and I'm inclined to agree. It's particularly baffling to me that it fails silently. This seems like the wrong behaviour. I also agree that bringing this up with the spec folk is likely a good idea; personally, this being my first day looking at the spec, there's a lot to not like about it and the apparently particular focus on web use cases, as opposed to the idea of fully general URLs that was originally envisioned, IMO hurts the spec.

@djc
Copy link
Contributor

djc commented Aug 20, 2020

Hi all, I've been added as a new maintainer for this library, so I'm trying to figure out how we can do this better in the future. I've just pushed some more changes to align with the spec, so we again have the risk of breaking downstream crates (although I think the changes are smaller this time around than they appear to have been before).

I tend to agree that this sort of thing is something that would ordinarily be encapsulated by a major 3.0.0 release (which doesn't have to be a massive ecosystem-breaking change, with the proper application of the semver trick (in fact, I think these sorts of just-behavioral, not-type-level breaking changes are the ideal application of the semver trick)).

@bstrie I read https://github.com/dtolnay/semver-trick but it leaves me a little unclear on how we would use this to, for example, update url 2.x to be compatible with 3.x while 3.x would have incompatible behavior only for the set_scheme() method. Can you elaborate on how you see this working?

I largely do think it does make sense to make semver-compatible releases to realign this crate with the upstream living standard (that's what living standards are for). Of course if this results in large scale breakage that is pretty bad, but I'm not sure it will always be easy to foresee the level of breakage.

At minimum we might consider some API that has no notion whatsoever of "special" schemes (whose existence the spec frustratingly neglects to justify), though without knowing why the spec makes this distinction I can't begin to determine if that's a good idea, and that also wouldn't help #581

This also makes sense to me: both in finding a way to have a way of ignoring/eliminating the "specialty" of some schemes in this crate, although it would be useful to get some more guidance first from the spec (repo/authors) on why the distinction exists in the first place. @annevk can you comment on this off the top of your head? If not, it would be useful if someone could do some archaeology on the spec repo to figure out when/how that was introduced. (I could do it, but it would be great if someone else could contribute some time/energy towards this.)

@annevk
Copy link

annevk commented Aug 21, 2020

The distinction exists because syntax and semantics are intertwined for those schemes, largely due to historical implementation choices that have become cemented due to network effects. So instead of operating on the output of the parser, those schemes require the parser to do something special.

Having said that, there's nothing precluding a Rust implementation from offering APIs outside what the specification offers (or being stricter in terms of error handling for some of its callers), but that might not be the focus of this particular crate.

@tmccombs
Copy link
Contributor

I wonder if it would make since to have some way for users of rust-url to specify that a scheme is "special". Perhaps a method on ParseOptions either to force "specialness", or give a list of additional schemes to treat as "special"?

cross-posted from #556

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants