router: reduce unnecessary `String` allocations #1165

hawkw · 2024-11-07T17:59:59Z

Currently, Dropshot's router will allocate and Clone a bunch of Strings during routing which it doesn't really need to be doing. This branch contains two commits which reduce these string clones and allocations:

router: Reduce string copies in path traversal (bfadc03)

Presently, the router's input_path_to_segments function segments the
whole input path, copying each each segment from the input path into a
new owned String, and returns a Vec<String>. This Vec is then
immediately iterated over by reference, and each &String segment is
then to_string()ed into a second owned String that's used when
returning the list of path variables. Path segments which are not
returned (i.e., literals) are still copied into a new String which is
used only to look up that segment in the current node's map of edges,
which doesn't require an owned String to perform. This all feels a bit
unfortunate, as we are allocating path segments a bunch of times more
than we need to. Ideally, we would iterate over slices borrowed from the
InputPath, and allocate Strings only when necessary in order to
return a path variable's value in the lookup result, or when we
percent-decode something.

This commit makes that change. Now, input_path_to_segments returns an
Iterator<Item = Result<Cow<'_, str>, InputPathError>>, which borrows
the segment from the path unless it was necessary to allocate a string
to percent-decode the segment. This does mean that we return an
Iterator of Results rather than a Result and thus have to handle a
potential error every time we get the next segment, but I don't think
that's terrible --- it's kind of the inherent cost of doing path
segmentation lazily. This also means that if we do encounter an error,
or if the path is determined not to route to an endpoint, we don't
continue wasting time segmenting the rest of the path; in the present
implementation, we always segment the whole path up front before
traversing it. I haven't done any real performance analysis of this, but
I imagine the lazy approach is more efficient; it certainly feels
nicer. It's also less vulnerable to issues where we spend a bunch of
time segmenting a really long path that will never route to anything.
router: represent methods as
http::Method (e2bbecb)

Currently, the Router stores HTTP methods for an endpoint as
Strings. This is a bit unfortunate, as it means the already-parsed
http::Method for each request has to be un-parsed back into a String
to look it up in the map, allocating a new String each time and
converting it to uppercases to perform the lookup. Switching to the
typed http::Method representation should make things a bit more
efficient (especially as Method also implements inline storage for
extension methods up to a certain length) and provide additional type
safety. This does require switching from a BTreeMap of strings to
handlers to a HashMap of Methods to handlers, as Method does not
implement Ord/PartialOrd, but ordering shouldn't matter here.

Note that this is not terribly urgent, so don't rush to review it; I just thought it would be nice to do eventually. The change in e2bbecb, storing methods as http::Methods, might be a bit nice to have when we add code for returning an Allow header on "method not allowed" errors after #1164, but it's not a blocker.

Currently, the `Router` stores HTTP methods for an endpoint as `String`s. This is a bit unfortunate, as it means the already-parsed `http::Method` for each request has to be un-parsed back into a `String` to look it up in the map, allocating a new `String` each time and converting it to uppercases to perform the lookup. Switching to the typed `http::Method` representation should make things a bit more efficient (especially as `Method` also implements inline storage for extension methods up to a certain length) and provide additional type safety. This does require switching from a `BTreeMap` of strings to handlers to a `HashMap` of `Method`s to handlers, as `Method` does not implement `Ord`/`PartialOrd`, but ordering shouldn't matter here.

Presently, the router's `input_path_to_segments` function segments the whole input path, copying each each segment from the input path into a new owned `String`, and returns a `Vec<String>`. This `Vec` is then immediately iterated over by reference, and each `&String` segment is then `to_string()`ed into a _second_ owned `String` that's used when returning the list of path variables. Path segments which are not returned (i.e., literals) are still copied into a new `String` which is used only to look up that segment in the current node's map of edges, which doesn't require an owned `String` to perform. This all feels a bit unfortunate, as we are allocating path segments a bunch of times more than we need to. Ideally, we would iterate over slices borrowed from the `InputPath`, and allocate `String`s only when necessary in order to return a path variable's value in the lookup result, or when we percent-decode something. This commit makes that change. Now, `input_path_to_segments` returns an `Iterator<Item = Result<Cow<'_, str>, InputPathError>>`, which borrows the segment from the path unless it was necessary to allocate a string to percent-decode the segment. This does mean that we return an `Iterator` of `Result`s rather than a `Result` and thus have to handle a potential error every time we get the next segment, but I don't think that's terrible --- it's kind of the inherent cost of doing path segmentation lazily. This also means that if we do encounter an error, or if the path is determined not to route to an endpoint, we don't continue wasting time segmenting the rest of the path; in the present implementation, we always segment the whole path up front before traversing it. I haven't done any real performance analysis of this, but I imagine the lazy approach is more efficient; it certainly *feels* nicer. It's also less vulnerable to issues where we spend a bunch of time segmenting a really long path that will never route to anything.

hawkw added 2 commits November 7, 2024 09:43

hawkw requested a review from davepacheco November 7, 2024 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

router: reduce unnecessary `String` allocations #1165

router: reduce unnecessary `String` allocations #1165

hawkw commented Nov 7, 2024

router: reduce unnecessary String allocations #1165

Are you sure you want to change the base?

router: reduce unnecessary String allocations #1165

Conversation

hawkw commented Nov 7, 2024

router: reduce unnecessary `String` allocations #1165

router: reduce unnecessary `String` allocations #1165