Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consolidate AcceptXXXHeader classes into Accept #460

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mmerickel
Copy link
Member

@mmerickel mmerickel commented Feb 26, 2024

  • Ensure Accept instances are immutable.
  • Merge missing/invalid/valid into the single instance - there were very very minor differences between the classes.
  • Delete deprecated __iter__.
  • Reimplement best_match, quality, and __contains__ methods using acceptable_offers.
  • Add HeaderState for simplified checking of the state.
  • Add enum-tools[sphinx] dependency for dealing with HeaderState.
  • Delete deprecated MIMEAccept class.
  • Fix broken dependency on setuptools in docs.

ping @whiteroses incase this is something you're interested in reviewing

@mmerickel mmerickel force-pushed the simplify-accept-parsing branch 4 times, most recently from 8af3f6b to cbc0f3b Compare February 27, 2024 03:31
@mmerickel mmerickel changed the title consolidate accept classes into AcceptHeader consolidate accept classes into Accept Feb 27, 2024
@mmerickel mmerickel force-pushed the simplify-accept-parsing branch 13 times, most recently from 7a5e3ce to b0b14d7 Compare March 1, 2024 06:44
@mmerickel mmerickel changed the title consolidate accept classes into Accept consolidate AcceptXXXHeader classes into Accept Mar 2, 2024
@mmerickel mmerickel force-pushed the simplify-accept-parsing branch 3 times, most recently from 33ee697 to af05bfd Compare March 2, 2024 21:32
@whiteroses
Copy link
Contributor

Hi @mmerickel, just saw your ping - it's been a while, so I'll need to find some time to refamiliarise myself with everything (may take a few days), but will review.

@whiteroses
Copy link
Contributor

Just to let you know, I'm almost done with the reviewing - will post comments tomorrow.

See backward incompatibilities below for more information.
See https://github.com/Pylons/webob/pull/460

- ``webob.acceptparse.Accept``, methods ``best_match``, ``quality``, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "webob.acceptparse.Accept's methods...", without the comma, be clearer?

- Remove previously-deprecated ``webob.acceptparse.Accept.__iter__``.
See https://github.com/Pylons/webob/pull/460

- ``webob.acceptparse.Accept`` methods, ``best_match``, ``quality``, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "webob.acceptparse.Accept's methods best_match, ..." be clearer?

Comment on lines +50 to +55
modifies their behavior slightly:

- Offers containing wildcards are no longer allowed.
- A tuple can no longer be an offer containing server-side quality values.
- An offer will only match a wildcard clause in the header, such as ``*/*``
or ``text/*`` if it does not match something more specific.
Copy link
Contributor

@whiteroses whiteroses Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, their behaviour has been modified more than slightly: the switch from the old algorithms to the one in .acceptable_offers involved many changes - some of which were documented in several of the docstrings that are being removed, but even those docstrings didn't include everything. As these methods have been deprecated for such a long time, it may be more helpful and accurate/precise to say that the methods have switched to using the algorithm in .acceptable_offers to conform to RFC 7231?

(I also had trouble understanding "A tuple can no longer be an offer containing server-side quality values." - is it saying that offers can no longer contain server-side quality values?)

(The same text is used in the docstrings of .best_match, .quality and .__contains__, so this also applies to them.)

Comment on lines +95 to +96
The state of an accept-style header to assist in identifying scenarios
an application may want to know about during accept-negotation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in "accept-negotation"?

The docstring seems a bit unclear - it may be helpful to explain that an Accept header may have a valid value, an invalid value, or may be missing from the request; and that this enum represents which of the three states the header is in?

@@ -339,7 +362,7 @@ def _parse_media_type_params(cls, media_type_params_segment):
value = cls._process_quoted_string_token(token=value)
media_type_params[index] = (name, value)

return media_type_params
return tuple(media_type_params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value has been changed to a tuple, but the docstring is still saying that it returns a list?

@@ -358,6 +381,9 @@ def _python_value_to_header_str(cls, value):
Convert Python value to header string for __add__/__radd__.
Copy link
Contributor

@whiteroses whiteroses Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the class method's name and its docstring are saying that the class method returns a header string, but it has been modified to return None when value is None?

The class method is also used in .__init__ and accept_property (and indirectly in .__add__ and .__radd__, via use of create_accept_header), so what the docstring says about the class method being for __add__ and __radd__ is no longer true?

Comment on lines +428 to +430
:return: If ``value`` is a valid ``Accept`` header, returns an iterator
of ``(*media_range*, *qvalue*, *media_type_params*,
*extension_params*)`` tuples, as parsed from the header from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original thinking in not using double backticks around the tuple was that media_range etc. were not variables, so the tuple was treated as part of the prose and the individual elements were emphasised with asterisks (italics) instead. If double backticks are to be used, the asterisks here and in the rest of the docstring should probably be removed or replaced with double backticks?

@@ -421,7 +447,7 @@ def parse(cls, value):
| *extension_params* is the extension parameters, as a list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"extension_params" has been changed from a list to a tuple?

Comment on lines +597 to +598
A tuple of ``(*media_range*, *qvalue*, *media_type_params*,
*extension_params*)`` tuples, where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as https://github.com/Pylons/webob/pull/460/files#r1531436169 with the asterisks here and in the rest of this docstring.

Comment on lines +709 to +710
Any offers that cannot be parsed via :meth:`.parse_offer`
will be ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.parse_offer is not currently included in the documentation, so this is not linking to anything?

will be ignored.

:param offers:
``iterable`` of ``str`` media types (media types can
Copy link
Contributor

@whiteroses whiteroses Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offers is a sequence, not an iterable - I think it is the same issue as #442, and there are probably a few of these around, likely my mistakes from back then. Not sure if you'd like to fix them in these PRs - or I could submit a PR to fix them after you're done.

Comment on lines +716 to +720
:return:
A list of tuples of the form (media type, qvalue), in
descending order of qvalue. Where two offers match the same
qvalue, they are returned in the same order as their order in
``offers``.
Copy link
Contributor

@whiteroses whiteroses Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for the return value from _AcceptInvalidOrNoHeader.acceptable_offers needs to be included here to explain what happens with the return value when the header is invalid or missing? (The single backticks should be double backticks though.)

Comment on lines +832 to +834
(iterable)

:meth:`AcceptValidHeader.best_match` uses its own algorithm (one not
specified in :rfc:`RFC 7231 <7231>`) to determine what is a best
match. The algorithm has many issues, and does not conform to
:rfc:`RFC 7231 <7231>`.
| Each item in the iterable must be a ``str`` media type and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"iterable" should be "sequence", as explained here.

match. The algorithm has many issues, and does not conform to
:rfc:`RFC 7231 <7231>`.
| Each item in the iterable must be a ``str`` media type and
may contain params/extensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Media types do not have extension parameters (https://datatracker.ietf.org/doc/html/rfc7231.html#section-3.1.1.1), and .acceptable_offers does not recognise them in media type offers.

Comment on lines +840 to +846
:return:
(``str``, or the type of ``default_match``)

:param offers: (iterable)
| The offer that is the best match based on q-value. If there is no
match, the value of ``default_match`` is returned. Where two
offers match the same qvalue, they are returned in the same order
as their order in ``offers``.
Copy link
Contributor

@whiteroses whiteroses Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.best_match returns a single offer, so "they are returned" seems a little confusing?

Also, inconsistent spelling of "qvalue"/"q-value"? (RFC spells it without a hyphen.)

Comment on lines -1057 to +873
:rfc:`RFC 7231, section 5.3.2 <7231#section-5.3.2>`. It does not
correctly take into account media type parameters:
:param offer: (``str``) media type offer
:return: (``float`` or ``None``)

>>> instance = AcceptValidHeader('text/html')
>>> instance.best_match(offers=['text/html;p=1']) is None
True
| The highest quality value from the media range(s) that match
the `offer`, or ``None`` if there is no match.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful to copy over the documentation for the return value in _AcceptInvalidOrNoHeader.quality, to explain what happens when the header is invalid or missing?

Comment on lines +933 to +935
The rules for adding values to a header are that the values are
appended if valid, or discarded. If everything is discarded then an
instance representing a missing header is returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads as a bit unclear - maybe could do with some rewording, or the explanations from the docstrings of the three .__add__s could be copied over and updated?

if self.header_value == "":
return other
if other.header_value == "":
return self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states that a new header object is always created, but if other.header_value == "", self is returned?


if isinstance(other, (AcceptNoHeader, AcceptInvalidHeader)):
return AcceptNoHeader()
other = create_accept_header(other)
Copy link
Contributor

@whiteroses whiteroses Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using create_accept_header to handle all the possible types for the value of other, but create_accept_header is documented as taking a parameter of type str or None?

It seems to be working anyway, because create_accept_header calls Accept.__init__ (which is also documented to take a header_value parameter of str or None), and that in turn calls ._python_value_to_header_str, which happens to do the right thing. Before, it was done via ._add_instance_and_non_accept_type, which calls ._python_value_to_header_str - but ._add_instance_and_non_accept_type has been removed? (I think the old __add__ supported all these many types, and the thinking at the time was to keep the types that were accepted by create_accept_header and .__init__ simple, while maintaining backward compatibility in __add__.)

Same issue with the use of create_accept_header in .__radd__?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The add and radd support what they did before - I just moved the logic into the __init__ as you noted. It sounds like you just want the __init__ to be documented to take those other types which is fine - I can do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are actually changing .__init__ to make it accept all those other types of values accepted by .__add__/.__radd__, then yes, it's only the documentation that needs to be updated.


| If `header_value` is an invalid ``Accept`` header, an
:class:`AcceptInvalidHeader` instance.
:param header_value: (``str`` or ``None``) header value
Copy link
Contributor

@whiteroses whiteroses Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be mentioned that None is used to represent a missing header?

There is code to handle the case where header_value is an Accept instance, but here the docstring says that header_value is either str or None?

Comment on lines 23 to +29
The classes that may be returned by one of the functions above, and their
methods:

.. autoclass:: Accept
:members: parse

.. autoclass:: AcceptOffer
:members: __str__

.. autoclass:: AcceptValidHeader
:members: parse, header_value, parsed, __init__, __add__, __bool__,
__contains__, __iter__, __radd__, __repr__, __str__,
accept_html, accepts_html, acceptable_offers, best_match, quality

.. autoclass:: AcceptNoHeader
:members: parse, header_value, parsed, __init__, __add__, __bool__,
__contains__, __iter__, __radd__, __repr__, __str__,
accept_html, accepts_html, acceptable_offers, best_match, quality
.. autoenum:: HeaderState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and their methods" makes the sentence quite ambiguous, and as the documentation for the classes also includes their non-method attributes, perhaps "and their methods" could be removed? (Assuming I'm reading the intended meaning correctly.)

Under this, AcceptOffer and HeaderState are included - but they are not classes that "may be returned by one of the functions above"?

assert result.header_value == left_operand.header_value
assert result is not left_operand
def test_copy(self):
instance = Accept("*/plain;charset=utf8;x-version=1")
Copy link
Contributor

@whiteroses whiteroses Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*/plain is valid according to https://datatracker.ietf.org/doc/html/rfc7231.html#section-5.3.2, but I think this is an oversight in the spec - I don't think they intended for it to be possible for type to be * when the subtype is not *. Not 100% certain about this, but as I think we can use any valid header value here, probably better to use one that we know is intended to be valid.

From the RFC: 'The "q" parameter is necessary if any extensions (accept-ext) are present, since it acts as a separator between the two parameter sets.' So this needs a "q" parameter before the extension parameter for the header to be valid (assuming that you want to use a valid header value here, and not an intentionally invalid one).

It does not seem like the missing "q" parameter is being detected by the regex at the moment - however, extension parameters have been removed in RFC 9110, which obsoletes RFC 7231, so whether it's worth fixing may depend on when you would like WebOb to start conforming to RFC 9110?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to removing extension param support - I don't have a strong opinion here. I'm not sure it's really hurting things much to keep them though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 9110 says that "The accept extension grammar (accept-params, accept-ext) has been removed because it had a complicated definition, was not being used in practice, and is more easily deployed through new header fields." I imagine they must be pretty certain that it's not being used in practice, to feel able to make the change.

I don't see how it would hurt things to keep the extension param support either, but it'd be nice to simplify things and be able to remove some code, I imagine. Not sure what you think about moving to conform to RFC 9110 in general - as RFC 7231 is obsoleted by RFC 9110, I imagine WebOb would want to make the move at some point?

If you would like to keep the extension param support for the time being, I can look into why the regex did not catch the missing "q" parameter separator?

Comment on lines -1223 to -1227
def test_best_match_zero_quality(self):
assert (
AcceptValidHeader("text/plain, */*;q=0").best_match(["text/html"]) is None
)
assert "audio/basic" not in AcceptValidHeader("*/*;q=0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These asserts seem to be worth keeping - I don't think they or any functionally equivalent asserts are in the new tests? (One assert is for .best_match and the other is for .__contains__ - not sure why the .__contains__ one was placed into a .best_match test method.)

valid_values_with_headers,
)
def test_valid_add_missing(self, input_value, input_header, maker, fn):
inst = Accept(input_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Accept.__init__ docstring, the header_value parameter is documented to be of type str or None, but other types are passed to Accept.__init__ in this test and quite a few others?

instance = AcceptInvalidHeader(header_value=", ")
returned = instance.quality(offer="type/subtype")
assert returned == 1.0
accept = Accept(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be Accept(", ") instead of Accept(None)?

Invalid = "Invalid"
Missing = "Missing"


class AcceptOffer(namedtuple("AcceptOffer", ["type", "subtype", "params"])):
"""
A pre-parsed offer tuple represeting a value in the format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in "represeting"?

Copy link
Contributor

@whiteroses whiteroses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Accept-* headers" section of docs/reference.txt would also need to be updated for this PR (L347 and L358-364).

Could use :type: and :rtype: (https://www.sphinx-doc.org/en/master/usage/domains/python.html#info-field-lists) to specify parameter and return value types in the documentation?

RFC 7231 is obsoleted by RFC 9110 - does that affect this PR?

Otherwise, all looks good from what I can see!

@mmerickel
Copy link
Member Author

mmerickel commented Mar 20, 2024

@whiteroses Thank you so much - excellent reviews. I will work on these when I'm back from vacation and try to apply similar fixes to the other branches. Looks like you dinged me good on docstrings but the code itself has passed muster which I'm happy about.

@whiteroses
Copy link
Contributor

Yes, most of them were docstring issues - enjoy your vacation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants