Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: reduce unnecessary String allocations #1165

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Nov 7, 2024

Currently, Dropshot's router will allocate and Clone a bunch of Strings during routing which it doesn't really need to be doing. This branch contains two commits which reduce these string clones and allocations:

  • router: Reduce string copies in path traversal (bfadc03)

    Presently, the router's input_path_to_segments function segments the
    whole input path, copying each each segment from the input path into a
    new owned String, and returns a Vec<String>. This Vec is then
    immediately iterated over by reference, and each &String segment is
    then to_string()ed into a second owned String that's used when
    returning the list of path variables. Path segments which are not
    returned (i.e., literals) are still copied into a new String which is
    used only to look up that segment in the current node's map of edges,
    which doesn't require an owned String to perform. This all feels a bit
    unfortunate, as we are allocating path segments a bunch of times more
    than we need to. Ideally, we would iterate over slices borrowed from the
    InputPath, and allocate Strings only when necessary in order to
    return a path variable's value in the lookup result, or when we
    percent-decode something.

    This commit makes that change. Now, input_path_to_segments returns an
    Iterator<Item = Result<Cow<'_, str>, InputPathError>>, which borrows
    the segment from the path unless it was necessary to allocate a string
    to percent-decode the segment. This does mean that we return an
    Iterator of Results rather than a Result and thus have to handle a
    potential error every time we get the next segment, but I don't think
    that's terrible --- it's kind of the inherent cost of doing path
    segmentation lazily. This also means that if we do encounter an error,
    or if the path is determined not to route to an endpoint, we don't
    continue wasting time segmenting the rest of the path; in the present
    implementation, we always segment the whole path up front before
    traversing it. I haven't done any real performance analysis of this, but
    I imagine the lazy approach is more efficient; it certainly feels
    nicer. It's also less vulnerable to issues where we spend a bunch of
    time segmenting a really long path that will never route to anything.

  • router: represent methods as
    http::Method
    (e2bbecb)

    Currently, the Router stores HTTP methods for an endpoint as
    Strings. This is a bit unfortunate, as it means the already-parsed
    http::Method for each request has to be un-parsed back into a String
    to look it up in the map, allocating a new String each time and
    converting it to uppercases to perform the lookup. Switching to the
    typed http::Method representation should make things a bit more
    efficient (especially as Method also implements inline storage for
    extension methods up to a certain length) and provide additional type
    safety. This does require switching from a BTreeMap of strings to
    handlers to a HashMap of Methods to handlers, as Method does not
    implement Ord/PartialOrd, but ordering shouldn't matter here.

Note that this is not terribly urgent, so don't rush to review it; I just thought it would be nice to do eventually. The change in e2bbecb, storing methods as http::Methods, might be a bit nice to have when we add code for returning an Allow header on "method not allowed" errors after #1164, but it's not a blocker.

Currently, the `Router` stores HTTP methods for an endpoint as
`String`s. This is a bit unfortunate, as it means the already-parsed
`http::Method` for each request has to be un-parsed back into a `String`
to look it up in the map, allocating a new `String` each time and
converting it to uppercases to perform the lookup. Switching to the
typed `http::Method` representation should make things a bit more
efficient (especially as `Method` also implements inline storage for
extension methods up to a certain length) and provide additional type
safety. This does require switching from a `BTreeMap` of strings to
handlers to a `HashMap` of `Method`s to handlers, as `Method` does not
implement `Ord`/`PartialOrd`, but ordering shouldn't matter here.
Presently, the router's `input_path_to_segments` function segments the
whole input path, copying each each segment from the input path into a
new owned `String`, and returns a `Vec<String>`. This `Vec` is then
immediately iterated over by reference, and each `&String` segment is
then `to_string()`ed into a _second_ owned `String` that's used when
returning the list of path variables. Path segments which are not
returned (i.e., literals) are still copied into a new `String` which is
used only to look up that segment in the current node's map of edges,
which doesn't require an owned `String` to perform. This all feels a bit
unfortunate, as we are allocating path segments a bunch of times more
than we need to. Ideally, we would iterate over slices borrowed from the
`InputPath`, and allocate `String`s only when necessary in order to
return a path variable's value in the lookup result, or when we
percent-decode something.

This commit makes that change. Now, `input_path_to_segments` returns an
`Iterator<Item = Result<Cow<'_, str>, InputPathError>>`, which borrows
the segment from the path unless it was necessary to allocate a string
to percent-decode the segment. This does mean that we return an
`Iterator` of `Result`s rather than a `Result` and thus have to handle a
potential error every time we get the next segment, but I don't think
that's terrible --- it's kind of the inherent cost of doing path
segmentation lazily. This also means that if we do encounter an error,
or if the path is determined not to route to an endpoint, we don't
continue wasting time segmenting the rest of the path; in the present
implementation, we always segment the whole path up front before
traversing it. I haven't done any real performance analysis of this, but
I imagine the lazy approach is more efficient; it certainly *feels*
nicer. It's also less vulnerable to issues where we spend a bunch of
time segmenting a really long path that will never route to anything.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant