Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce per REST endpoint media types #64406

Merged

Conversation

pgomulka
Copy link
Contributor

@pgomulka pgomulka commented Oct 30, 2020

Per REST endpoint media types declaration allows to make parsing/validation more strict.

If a media type was declared only in one endpoint (for instance CSV in SQL endpoint) it should not be allowed to parse that media type when using other endpoints.
However, the Compatible API need to be able to understand all media types supported by Elasticsearch in order to parse a compatible-with=version parameter.
This implies that endpoints need to declare which media type they support and how to parse them (if introducing new media types - like SQL).

How to parse:
MediaType interface still serves as an abstraction on top of XContentType and TextFormat. It also has a declaration of mappings String-MediaType with parameters. Parameters declares the names of parameters and regex to validate its values.
This instructs how to perform the parsing. For instance - XContentType.JSON has the mapping of application/vnd.elasticsearch+json -> JSON and allows parameters compatible-with=\d and charset=utf-8

MediaTypeParser was simplified into ParsedMediaType class with static factory method for parsing.

How to declare:
RestHandler interface is extended with a validAcceptMediaTypes which returns a MediaTypeRegistry - a class that encapsulates mappings of string (type/subtype) to MediaType, allowed parameters and formatPathParameter values.
We only need to allow of declaration of valid media types for Accept header. Content-Type valid media types are fixed to XContentType instances - json, yaml, smile, cbor.

relates #51816

@pgomulka pgomulka added :Core/Infra/REST API REST infrastructure and utilities v8.0.0 labels Oct 30, 2020
@pgomulka pgomulka self-assigned this Oct 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/REST API)

@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Oct 30, 2020
Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes are looking good. most of the comments are mostly around naming or minor things.

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good a couple more comments.

if (paramsAsString.isEmpty()) {
continue;
}
// intentionally allowing to have spaces around `=` (RFC disallows this)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only mention I saw in the spec was " No whitespace is allowed between the header field-name and colon. "

The field value is not covered by this mandate:

header-field   = field-name ":" OWS field-value OWS

This does not explicitly state that the field-value can not have whitespace. For example, "application/<space>json" isn't forbidden by the HTTP spec (spaces are allowed in the field-value). The correctness of spaces in mimetypes or parameters would be defined by https://tools.ietf.org/html/rfc6838 (media type spec). While it is pretty clear that "application/<space>json" != "application/json", I think we should still allow it since that is a nightmare to troubleshoot. Also the spec is not clear on if ;<space>param<space>=<space>value is acceptable. I think it actually is.

I would suggest to remove the "RFC disallows this".

Copy link
Contributor Author

@pgomulka pgomulka Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by RFC I meant https://tools.ietf.org/html/rfc7231#section-3.1.1.1 where it refers to the rfc6838 (the one you quoted) and makes it more strict

 Internet media types ought to be registered with IANA according to
   the procedures defined in [BCP13].

      Note: Unlike some similar constructs in other header fields, media
      type parameters do not allow whitespace (even "bad" whitespace)
      around the "=" character.

[BCP13] Freed, N., Klensin, J., and T. Hansen, "Media Type
Specifications and Registration Procedures", BCP 13,
RFC 6838, January 2013.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that, sorry. You were right, probably shouldn't be lenient here, but only with the whitespace around the =. The code here is fine, but a follow up (or on this PR) to remove adhere to that would probably be best. sorry for steering you in the wrong direction on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add validation around =

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterations. LGTM

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some formatting nits and a request to expand on code level comments in one spot. LGTM

@@ -104,6 +106,23 @@ private RestRequest(NamedXContentRegistry xContentRegistry, Map<String, String>
this.requestId = requestId;
}

private static @Nullable ParsedMediaType parseHeaderWithMediaType(Map<String, List<String>> headers, String headerName) {
//TODO: make all usages of headers case-insensitive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a not well documented fact, but by default Netty uses a case insensitive hashing strategy for retrieval of headers. Checkout the DefaultHttpHeaders class if you're interested. Due to our use of Netty for HTTP parsing we also get this benefit in the nio transport. Also, see the Netty4HttpRequest/NioHttpRequest.HttpHeadersMap class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I did not know about this.
I wrongly relied on tests only using map from java collections.
Maybe we might want to refactor usages of java.util.Map to an interface that documents/enforces case insensitivity?
I will leave this todo for now so that we can reconsider this in the future

@@ -71,6 +72,14 @@ public ClientYamlTestResponse(Response response) throws IOException {
}
}

private XContentType getContentTypeIgnoreExceptions(String contentType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love if we can move the comment from the usage to a method level documentation that expands on why we use null; I get it now but at some point it won't be obvious anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, will add more comment on per method javadoc

@pgomulka pgomulka merged commit c219e17 into elastic:master Nov 5, 2020
pgomulka added a commit to pgomulka/elasticsearch that referenced this pull request Nov 5, 2020
Per https://tools.ietf.org/html/rfc7231#section-3.1.1.1 MediaType can
have spaces around parameters pair, but do not allow spaces within
parameter. That means that parameter pair forbids spaces around `=`

follow up after
elastic#64406 (comment)
relates elastic#51816
pgomulka added a commit that referenced this pull request Nov 6, 2020
Per https://tools.ietf.org/html/rfc7231#section-3.1.1.1 MediaType can
have spaces around parameters pair, but do not allow spaces within
parameter. That means that parameter pair forbids spaces around =

follow up after
#64406 (comment)
relates #51816
bpintea added a commit to bpintea/elasticsearch that referenced this pull request Jun 15, 2021
This adds a (branch-specific) media type parser as a near-drop-in
replacement for the (master-only) per-REST-endpoint media types parser
(elastic#64406).
bpintea added a commit that referenced this pull request Jun 21, 2021
This adds a (branch-specific) media type parser as a near-drop-in
replacement for the (master-only) per-REST-endpoint media types parser
(#64406).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/REST API REST infrastructure and utilities >refactoring Team:Core/Infra Meta label for core/infra team v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants