Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flags to allow multiple spaces in request and status lines #110

Conversation

acfoltzer
Copy link
Contributor

@acfoltzer acfoltzer commented Mar 15, 2022

Depends on #111.

These new flags set whether multiple spaces are allowed as delimiters in request lines and response status lines.

The latest version of the HTTP/1.1 spec (request lines, response status lines) allows implementations to parse multiple whitespace characters in place of the SP delimiter in the response status line, including:

SP, HTAB, VT (%x0B), FF (%x0C), or bare CR

This option relaxes the parsers to allow for multiple spaces, but does not allow the delimiters to contain the other mentioned whitespace characters. We'd rather wait for someone to have a concrete use case before deciding to support that, as allowing chars like \r raises serious security questions as described by the spec.

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@@ -499,7 +544,18 @@ impl<'h, 'b> Response<'h, 'b> {
complete!(skip_empty_lines(&mut bytes));
self.version = Some(complete!(parse_version(&mut bytes)));
space!(bytes or Error::Version);
Copy link
Contributor Author

@acfoltzer acfoltzer Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat off-topic for this PR, but I was tempted to change this to return Error::Status instead. Without the new flag enabled, HTTP/1.1 200\r\n currently will fail with Error::Version which doesn't seem quite right. I left it alone to minimize this change though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I... do not why Github is not rendering the spaces in that markdown correctly. How about in block form:

HTTP/1.1   200\r\n

@acfoltzer acfoltzer marked this pull request as ready for review March 15, 2022 21:13
@acfoltzer acfoltzer marked this pull request as draft March 15, 2022 22:43
@acfoltzer
Copy link
Contributor Author

Marking this as a draft temporarily, it turns out I was premature and we'll need this for request lines too 😞

@acfoltzer acfoltzer force-pushed the acfoltzer/multiple-space-status-line-delimiters branch from 26718f9 to de7ff46 Compare March 16, 2022 21:56
@acfoltzer
Copy link
Contributor Author

This is ready for review, but will depend on #111 to actually pass tests etc. It doesn't look like I can change the base branch of the PR to point to my fork, but once #111 is merged it should update automatically.

@acfoltzer acfoltzer marked this pull request as ready for review March 16, 2022 21:57
@acfoltzer acfoltzer changed the title Add flag to allow multiple spaces in response status lines Add flags to allow multiple spaces in request and status lines Mar 16, 2022
@acfoltzer
Copy link
Contributor Author

Sigh, the fuzzer found a bug with this PR after combining with #111. Back to draft for now.

@acfoltzer acfoltzer marked this pull request as draft March 16, 2022 22:42
@acfoltzer acfoltzer force-pushed the acfoltzer/multiple-space-status-line-delimiters branch 2 times, most recently from 8ccecd6 to 401f9a9 Compare March 18, 2022 17:43
@acfoltzer
Copy link
Contributor Author

I have rewritten this to eagerly check the flag rather than backtracking after an error, which fixes the bugs mentioned above and obviates the issue where the reason phrase would include leading spaces from the delimiter. Happily this seems to be a performance improvement rather than a regression even when the new flags are disabled (results vs 6f6ff10):

req/req                 time:   [292.13 ns 295.81 ns 301.23 ns]
                        thrpt:  [2.2261 GiB/s 2.2668 GiB/s 2.2954 GiB/s]
                 change:
                        time:   [-13.390% -12.468% -11.159%] (p = 0.00 < 0.05)
                        thrpt:  [+12.561% +14.244% +15.460%]
                        Performance has improved.

req_short/req_short     time:   [62.834 ns 62.992 ns 63.188 ns]
                        thrpt:  [1.0023 GiB/s 1.0054 GiB/s 1.0079 GiB/s]
                 change:
                        time:   [-11.112% -9.9009% -8.9416%] (p = 0.00 < 0.05)
                        thrpt:  [+9.8196% +10.989% +12.501%]
                        Performance has improved.

resp/resp               time:   [303.51 ns 304.23 ns 305.34 ns]
                        thrpt:  [2.1320 GiB/s 2.1398 GiB/s 2.1449 GiB/s]
                 change:
                        time:   [-12.059% -11.512% -11.016%] (p = 0.00 < 0.05)
                        thrpt:  [+12.379% +13.010% +13.713%]
                        Performance has improved.

resp_short/resp_short   time:   [67.521 ns 67.686 ns 67.929 ns]
                        thrpt:  [1.2476 GiB/s 1.2521 GiB/s 1.2552 GiB/s]
                 change:
                        time:   [-9.1562% -8.7366% -8.3331%] (p = 0.00 < 0.05)
                        thrpt:  [+9.0906% +9.5730% +10.079%]
                        Performance has improved.

I haven't thrown it into godbolt or anything yet, but I suspect something about this change triggers a different inlining behavior compared to the baseline.

@acfoltzer acfoltzer marked this pull request as ready for review March 18, 2022 18:06
These new flags set whether multiple spaces are allowed as delimiters in request lines and response
status lines.

The latest version of the HTTP/1.1 spec ([request lines][spec-req], [response status
lines][spec-resp]) allows implementations to parse multiple whitespace characters in place of the
`SP` delimiter in the response status line, including:

> SP, HTAB, VT (%x0B), FF (%x0C), or bare CR

This option relaxes the parsers to allow for multiple spaces, but does *not* allow the delimiters to
contain the other mentioned whitespace characters. We'd rather wait for someone to have a concrete
use case before deciding to support that, as allowing chars like `\r` raises serious security
questions as described by the spec.

Happily this seems to be a performance _improvement_ rather than a regression even when the new
flags are disabled (results vs 6f6ff10):

```
req/req                 time:   [292.13 ns 295.81 ns 301.23 ns]
                        thrpt:  [2.2261 GiB/s 2.2668 GiB/s 2.2954 GiB/s]
                 change:
                        time:   [-13.390% -12.468% -11.159%] (p = 0.00 < 0.05)
                        thrpt:  [+12.561% +14.244% +15.460%]
                        Performance has improved.

req_short/req_short     time:   [62.834 ns 62.992 ns 63.188 ns]
                        thrpt:  [1.0023 GiB/s 1.0054 GiB/s 1.0079 GiB/s]
                 change:
                        time:   [-11.112% -9.9009% -8.9416%] (p = 0.00 < 0.05)
                        thrpt:  [+9.8196% +10.989% +12.501%]
                        Performance has improved.

resp/resp               time:   [303.51 ns 304.23 ns 305.34 ns]
                        thrpt:  [2.1320 GiB/s 2.1398 GiB/s 2.1449 GiB/s]
                 change:
                        time:   [-12.059% -11.512% -11.016%] (p = 0.00 < 0.05)
                        thrpt:  [+12.379% +13.010% +13.713%]
                        Performance has improved.

resp_short/resp_short   time:   [67.521 ns 67.686 ns 67.929 ns]
                        thrpt:  [1.2476 GiB/s 1.2521 GiB/s 1.2552 GiB/s]
                 change:
                        time:   [-9.1562% -8.7366% -8.3331%] (p = 0.00 < 0.05)
                        thrpt:  [+9.0906% +9.5730% +10.079%]
                        Performance has improved.
```

I haven't thrown it into godbolt or anything yet, but I suspect something about this change triggers
a different inlining behavior compared to the baseline.

[spec-req]: https://httpwg.org/http-core/draft-ietf-httpbis-messaging-latest.html#rfc.section.3.p.3
[spec-resp]: https://httpwg.org/http-core/draft-ietf-httpbis-messaging-latest.html#rfc.section.4.p.3
@acfoltzer acfoltzer force-pushed the acfoltzer/multiple-space-status-line-delimiters branch from 401f9a9 to 36871cc Compare March 29, 2022 23:19
Copy link
Owner

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@seanmonstar seanmonstar merged commit 24c723a into seanmonstar:master Mar 31, 2022
@seanmonstar
Copy link
Owner

Is there any other parsing changes you expect, or shall we prep a crate release?

@acfoltzer
Copy link
Contributor Author

@seanmonstar we should be good to prep a release, thanks! I've been fuzzing this against our internal fork of picohttpparser and am not finding any other places for improvement, although I am excluding path parsing from the fuzzing for now. Paths are more difficult because both httparse and pico lean on other libraries to meaningfully interpret them, so future changes that come out of that work are more likely to land in http anyway.

@acfoltzer acfoltzer deleted the acfoltzer/multiple-space-status-line-delimiters branch April 1, 2022 21:49
@nox
Copy link
Contributor

nox commented Apr 25, 2022

This broke some stuff: prior to this PR, there was no way to parse a request with a specific parser config, that allowed me to implement allow_spaces_after_header_name_in_responses and allow_obsolete_multiline_headers_in_responses only for responses without explicit checks.

Now that there is a ParserConfig::parse_request method, those flags are applied to both requests and responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants