-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version 1.0, rewrite the data structure and API #176
Conversation
WebIdl is a curious name; maybe DomApi? |
Would we be better off calling this version 0.7.0 and waiting for it to bake in before calling it 1.0.0? |
Is the copyright statement correct? IANAL but I think it should be copyrighted to Mozilla, at least for the work done as an employee. |
Since rust-url isn't on reviewable, do you want line comments via github? |
Wow, thanks for all this @SimonSapin! (especially the updates to some crates) These changes all sound great to me from an API perspective, should certainly allow for a good amount of flexibility in rust-url after 1.0 has happened (without making a new major release). I might recommend what @asajeffrey was mentioned, however, of holding off on 1.0 for now until there's some experience with this rewrite. Similarly libc avoided going to 1.0 initially and instead went to 0.2, and now that we're confident we'll move it to 1.0 when we can. That being said it's up to you. You've done a lot of the update work so you've probably got quite a good handle on what the API is like :) |
I’ve added a "Downsides" sections to the initial message. Setters are ugly. @Ms2ger I’m ok with renaming it, though https://url.spec.whatwg.org/ doesn’t use "DOM" for this at all. @asajeffrey rust-url is on reviewable, but I hid the link since I thought we might want to hold a full code review (it’s kinda big) until the API is settled. But feel free to use https://reviewable.io/reviews/servo/rust-url/176 if you prefer. Re copyright, I have no idea. I don’t mind changing it if someone can provide legal advice. I vaguely remember that "authorship" (~copyright) can not be transferred in French law. (Though usage rights can be licensed I think?) For the version number, I don’t mind making it 0.6 or something. 1.0 seemed appropriate to signal that it is "more breaking" than 0.2 -> 0.3 -> 0.4 -> 0.5 have been so far, and that crate is starting to be more "mature". I don’t think it matters a lot if future semver-breaking changes as needed are 0.6 -> 0.7 -> 0.8 or 1.0 -> 2.0 -> 3.0. However, I’d rather not make a semver-breaking release later gratuitously just to add a "not 0.x" stamp. |
Humongous 👍 from me. If I could ask for 1 additional getter, it would be |
@seanmonstar would |
@SimonSapin I believe that works. Is there desire to not add an |
Making sure this doesn't get lost in IRC... It might be nice to have: impl Url { fn query_params<'a>(&'a self) -> QueryParams<'a> { ... } }
impl<'a> Iterator<(Cow<'a,str>,Cow<'a,str>)> for QueryParams<'a> { ... } so people who want to can, e,g, build a hash table of the query parameters. |
@seanmonstar the idea is to have one generic mechanism (indexing) rather than a proliferation of ad-hoc methods for every combinations that various use cases might require. |
I haven't gone through all the changes, but the old scheme_type_mapper was quite useful for Gecko integration, as chrome:// resource:// and a bunch of others work just like regular special protocols. |
I think this is great overall! I think the indexing mechanism is interesting, and I think there is value in having it be general to this level, but I fear that not having some getters/helpers may be a negative impact on ergonomics. At least for the common components such as those currently delineated by the Instead of thinking about what component someone wants and just using it, they'll have to know/lookup the "indexing language" in order to determine which indexing positions are appropriate, which may not always be immediately obvious for more obscure components. Sure, the positions are pretty self-explanatory, but I still think it incurs a context-switch. What do you think about having something like a Failing that, I think a good compromise would be to continue to have some getters/helpers for the most basic/common components, while still being able to access pretty much any component thanks to the more general indexing mechanic you introduced. We could define these in a separate module as free functions or in some kind of |
@valenting With whatwg/url@b266a43, the parsing algorithm in the spec changed to better support unknown schemes like
The spec still has a concept of special scheme which is just |
They don’t! Sorry this was unclear, impl Url {
pub fn scheme(&self) -> &str { … }
pub fn username(&self) -> &str { … }
pub fn password(&self) -> Option<&str> { … }
pub fn host_str(&self) -> Option<&str> { … }
pub fn port(&self) -> Option<u16> { … }
pub fn path(&self) -> &str { … }
pub fn query(&self) -> Option<&str> { … }
pub fn fragment(&self) -> Option<&str> { … }
} (There are corresponding Indexing is for when you want more than one consecutive component, like "authority" (username, password, host, and port) or "path and query". You can get a |
@SimonSapin My apologies! I skimmed through the changes and completely missed that :( Great work! |
@SimonSapin That's great then! I think it shouldn't be too hard to update the C API. |
@valenting Yes, setters have some common patterns that probably can be refactored to share code. What alternative would there be to keeping indexes around? |
It fails to build for me on Windows.
Here's my relevant
|
@SimonSapin The only alternative would be keeping the segments as separate objects, which means more allocations, and worse getter performance, so this is probably the best option. PS. The bits in slicing.rs are simply wonderful! |
As a bit of data for the position API, the uses of it in Servo are: url[Position::BeforePath..Position::AfterQuery]
url[..Position::AfterPath]
url[Position::BeforePath..]
url[..Position::BeforeFragment] and: Url::parse(&load_data.url[Position::BeforeUsername..]) The last one is a bit odd because the URL is being serialized then parsed, which seems like wasted effort. Perhaps an API like: Url::before_foo() -> Url
Url::after_foo() -> Url would do the trick? So |
PS: In order to avoid lots of copying, the above API would need us to change the backing store from a |
Say no to Cow, and yes to |
@bors-servo r=asajeffrey (Again after hitting "Synchronize" on http://build.servo.org/homu/queue/rust-url) |
📌 Commit 4a59d93 has been approved by |
Version 1.0, rewrite the data structure and API **Do not merge yet**, I’d like to get feedback on API design. This is a large rewrite of the data structures and API that aims to eventually become version 1.0.0. Motivations are: * I designed rust-url when I was relatively new to Rust, coming from Python where memory allocation and ownership are not concerns. The URL Standard also writes algorithms to optimize for human comprehension, not computer resources. As a result, each component of the `Url` struct is a `String` or a `Vec<String>`. * At the same time, I tried to use enums to enforce some invariants in the type system. For example, “non-relative” like `data:` don’t have a host or port number. In practice I think this makes the API a pain to use more than it’s worth. * The spec [has changed](whatwg/url@b266a43) a lot (notably around dealing with protocols/schemes outside the small list it knows about), and I’m not sure the API can be adapted backward-compatibly anyway. I think the API can be more "rusty", and a bunch of long-standing issues be fixed along the way. The big idea is that the `Url` struct now contains a single `String` of the URL’s serialization (the only heap memory allocation), plus some indices into it (and some other meta-data). The indices allow accessor methods for URL components to return `&str` in O(1) time. The fields of `Url` are now private, as they need to maintain a number of invariants. Getting invariants wrong is not a memory-safety hazard (e.g. string indexing is still checked) but can cause non-sensical results. Rather than multiple ad-hoc methods like `serialize_authority` and `serialize_without_fragment`, `Url` implements indexing with ranges of `Position`, which is an enum indicating a position between URL components. For example, `serialize_without_fragment` is now `&url[..Position::AfterQuery]`, takes O(1) time, and returns a `&str` slice. While we’re making breaking changes, this is an opportunity to clean up APIs that didn’t turn out so great: * Drop `UrlParser`. The builder pattern works well for `std::process::Command`, but for URLs in practice almost only `base_url` is ever used. v0.5.2 already has [`base_url.join(str)`](https://github.com/servo/rust-url/blob/v0.5.2/src/lib.rs#L867-L874), which stays the same. * Drop all the `lossy_percent_decode_*` methods. They don’t seem to be used at all. Maybe accessors could return a `PercentEncodedStr` wrapper that dereferences to `&str` and has a `percent_decode()` method? * In the `percent_encoding` module, there was “duplicated” APIs that returned a new `String` or `Vec<u8>` for convenience, or pushed to an existing `&mut String` or `&mut Vec<u8>` for flexibility. This was a bit awkward, so I’ve changed it to return iterators of `char` or `u8` which can be used with `.extend(…)` or `.collect()`. I’m not sure this an improvement and might revert it. Feedback appreciated. This new design is great for accessing components, but not so much for modifying them. We need to make changes in the middle of the string, and with percent-encoding we don’t know the encoded size of the new thing until we’ve encoded it. Then we need to fix up indices. As a result, the `set_*` methods have lots of fiddly code where it’s easy to make subtle mistakes. This code should be well-tested before this PR is merged, but it’s not at all yet. To see what the new API looks like more easily than by reading the code, you can find the output of `cargo doc` at https://simonsapin.github.io/rust-url-1.0-docs/url/ Too see what the API usage is like in practice, I’ve updated some crates that use rust-url for this PR: * https://github.com/alexcrichton/git2-rs/pull/115 * rust-lang/cargo#2428 * rwf2/cookie-rs#42 * hyperium/hyper#740 * https://github.com/cyderize/rust-websocket/pull/70 * servo/servo#9840 Since this PR isn’t merged yet, to do the same in your crates you’ll need a clone of the git repository at the right branch: ``` git clone https://github.com/servo/rust-url --branch 1.0 ``` … and a Cargo [path override](http://doc.crates.io/guide.html#overriding-dependencies) in the directory of your crate: `.cargo/config` ``` path = [ "../rust-url", ] ``` Since we’re making breaking changes, this is the opportunity to make *more* (or different) breaking changes before version 1.0 is published and we commit to this new API. Please comment here to say what are your pain points with the 0.5.x API, what could be improved, what I’ve done in this PR that doesn’t seem like a good idea, etc. Thanks! <!-- Reviewable:start --> <!-- (Simon) Hidden for now [<img src="https://reviewable.io/review_button.svg" height="40" alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/176) --> <!-- Reviewable:end -->
☀️ Test successful - travis |
🎊 🎊 🎊 🎊 Thanks everyone! Please don’t |
@SimonSapin did the issues with |
@sfackler That’s part of what I’ll look at tomorrow before publishing. |
Update to rust-url 1.0 **Do not merge yet:** rust-url 1.0 is not published yet and may still get breaking changes. The goal of this PR for now is to demonstrate API usage. Depends on: * servo/rust-url#176 * rwf2/cookie-rs#42 * hyperium/hyper#740 * https://github.com/cyderize/rust-websocket/pull/70 <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.svg" height="40" alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/9840) <!-- Reviewable:end -->
Update to rust-url 1.0 **Do not merge yet:** rust-url 1.0 is not published yet and may still get breaking changes. The goal of this PR for now is to demonstrate API usage. Depends on: * servo/rust-url#176 * rwf2/cookie-rs#42 * hyperium/hyper#740 * https://github.com/cyderize/rust-websocket/pull/70 <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.svg" height="40" alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/9840) <!-- Reviewable:end -->
Update to rust-url 1.0 **Do not merge yet:** rust-url 1.0 is not published yet and may still get breaking changes. The goal of this PR for now is to demonstrate API usage. Depends on: * servo/rust-url#176 * rwf2/cookie-rs#42 * hyperium/hyper#740 * https://github.com/cyderize/rust-websocket/pull/70 <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.svg" height="40" alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/9840) <!-- Reviewable:end -->
Update to rust-url 1.0 **Do not merge yet:** rust-url 1.0 is not published yet and may still get breaking changes. The goal of this PR for now is to demonstrate API usage. Depends on: * servo/rust-url#176 * rwf2/cookie-rs#42 * hyperium/hyper#740 * https://github.com/cyderize/rust-websocket/pull/70 <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.svg" height="40" alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/9840) <!-- Reviewable:end -->
Update to rust-url 1.0 **Do not merge yet:** rust-url 1.0 is not published yet and may still get breaking changes. The goal of this PR for now is to demonstrate API usage. Depends on: * servo/rust-url#176 * rwf2/cookie-rs#42 * hyperium/hyper#740 * https://github.com/cyderize/rust-websocket/pull/70 <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.svg" height="40" alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/9840) <!-- Reviewable:end -->
Aaaaaand we’re live! https://crates.io/crates/url/1.0.0 |
Awesome work @SimonSapin! 🎊 |
Update to rust-url 1.0 **Do not merge yet:** rust-url 1.0 is not published yet and may still get breaking changes. The goal of this PR for now is to demonstrate API usage. Depends on: * <s>https://github.com/servo/rust-url/pull/176</s> * <s>https://github.com/alexcrichton/cookie-rs/pull/42</s> * <s>https://github.com/hyperium/hyper/pull/740</s> * https://github.com/cyderize/rust-websocket/pull/70 * mozilla/webdriver-rust#28 <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.svg" height="40" alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/9840) <!-- Reviewable:end -->
Update to rust-url 1.0 **Do not merge yet:** rust-url 1.0 is not published yet and may still get breaking changes. The goal of this PR for now is to demonstrate API usage. Depends on: * <s>https://github.com/servo/rust-url/pull/176</s> * <s>https://github.com/alexcrichton/cookie-rs/pull/42</s> * <s>https://github.com/hyperium/hyper/pull/740</s> * https://github.com/cyderize/rust-websocket/pull/70 * mozilla/webdriver-rust#28 <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.svg" height="40" alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/9840) <!-- Reviewable:end -->
aah, so |
…sajeffrey **Do not merge yet:** rust-url 1.0 is not published yet and may still get breaking changes. The goal of this PR for now is to demonstrate API usage. Depends on: * <s>https://github.com/servo/rust-url/pull/176</s> * <s>https://github.com/alexcrichton/cookie-rs/pull/42</s> * <s>https://github.com/hyperium/hyper/pull/740</s> * https://github.com/cyderize/rust-websocket/pull/70 * mozilla/webdriver-rust#28 Source-Repo: https://github.com/servo/servo Source-Revision: 84ab7e9fe8f4a6528995eff3eb6e814cb724c364 UltraBlame original commit: 6da38378a33a0b292803ee9f46fd03ed7ecd4968
…sajeffrey **Do not merge yet:** rust-url 1.0 is not published yet and may still get breaking changes. The goal of this PR for now is to demonstrate API usage. Depends on: * <s>https://github.com/servo/rust-url/pull/176</s> * <s>https://github.com/alexcrichton/cookie-rs/pull/42</s> * <s>https://github.com/hyperium/hyper/pull/740</s> * https://github.com/cyderize/rust-websocket/pull/70 * mozilla/webdriver-rust#28 Source-Repo: https://github.com/servo/servo Source-Revision: 84ab7e9fe8f4a6528995eff3eb6e814cb724c364 UltraBlame original commit: 6da38378a33a0b292803ee9f46fd03ed7ecd4968
…sajeffrey **Do not merge yet:** rust-url 1.0 is not published yet and may still get breaking changes. The goal of this PR for now is to demonstrate API usage. Depends on: * <s>https://github.com/servo/rust-url/pull/176</s> * <s>https://github.com/alexcrichton/cookie-rs/pull/42</s> * <s>https://github.com/hyperium/hyper/pull/740</s> * https://github.com/cyderize/rust-websocket/pull/70 * mozilla/webdriver-rust#28 Source-Repo: https://github.com/servo/servo Source-Revision: 84ab7e9fe8f4a6528995eff3eb6e814cb724c364 UltraBlame original commit: 6da38378a33a0b292803ee9f46fd03ed7ecd4968
Do not merge yet, I’d like to get feedback on API design.What’s changing
This is a large rewrite of the data structures and API that aims to eventually become version 1.0.0. Motivations are:
Url
struct is aString
or aVec<String>
.data:
don’t have a host or port number. In practice I think this makes the API a pain to use more than it’s worth.I think the API can be more "rusty", and a bunch of long-standing issues be fixed along the way. The big idea is that the
Url
struct now contains a singleString
of the URL’s serialization (the only heap memory allocation), plus some indices into it (and some other meta-data). The indices allow accessor methods for URL components to return&str
in O(1) time.The fields of
Url
are now private, as they need to maintain a number of invariants. Getting invariants wrong is not a memory-safety hazard (e.g. string indexing is still checked) but can cause non-sensical results.Rather than multiple ad-hoc methods like
serialize_authority
andserialize_without_fragment
,Url
implements indexing with ranges ofPosition
, which is an enum indicating a position between URL components. For example,serialize_without_fragment
is now&url[..Position::AfterQuery]
, takes O(1) time, and returns a&str
slice.While we’re making breaking changes, this is an opportunity to clean up APIs that didn’t turn out so great:
UrlParser
. The builder pattern works well forstd::process::Command
, but for URLs in practice almost onlybase_url
is ever used. v0.5.2 already hasbase_url.join(str)
, which stays the same.lossy_percent_decode_*
methods. They don’t seem to be used at all. Maybe accessors could return aPercentEncodedStr
wrapper that dereferences to&str
and has apercent_decode()
method?percent_encoding
module, there was “duplicated” APIs that returned a newString
orVec<u8>
for convenience, or pushed to an existing&mut String
or&mut Vec<u8>
for flexibility. This was a bit awkward, so I’ve changed it to return iterators ofchar
oru8
which can be used with.extend(…)
or.collect()
. I’m not sure this an improvement and might revert it. Feedback appreciated.Downsides
This new design is great for accessing components, but not so much for modifying them. We need to make changes in the middle of the string, and with percent-encoding we don’t know the encoded size of the new thing until we’ve encoded it. Then we need to fix up indices. As a result, the
set_*
methods have lots of fiddly code where it’s easy to make subtle mistakes. This code should be well-tested before this PR is merged, but it’s not at all yet.Docs
To see what the new API looks like more easily than by reading the code, you can find the output of
cargo doc
at https://simonsapin.github.io/rust-url-1.0-docs/url/Trying it out
Too see what the API usage is like in practice, I’ve updated some crates that use rust-url for this PR:
Since this PR isn’t merged yet, to do the same in your crates you’ll need a clone of the git repository at the right branch:
… and a Cargo path override in the directory of your crate:
.cargo/config
Feedback, please
Since we’re making breaking changes, this is the opportunity to make more (or different) breaking changes before version 1.0 is published and we commit to this new API.
Please comment here to say what are your pain points with the 0.5.x API, what could be improved, what I’ve done in this PR that doesn’t seem like a good idea, etc.
Thanks!