Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate and schedule removal of collections.abc.ByteString and typing.ByteString #91896

Closed
JelleZijlstra opened this issue Apr 25, 2022 · 10 comments
Assignees
Labels
topic-typing type-feature A feature request or enhancement

Comments

@JelleZijlstra
Copy link
Member

JelleZijlstra commented Apr 25, 2022

The current docstring of collections.abc.ByteString is:

    """This unifies bytes and bytearray.

    XXX Should add all their methods.
    """

Let's do that last thing. This will be useful for typing code that accepts both bytes and bytearray, especially with my proposal in PEP-688 to make bytes no longer acceptable as a shortcut for bytearray in the type system.

cc @rhettinger for collections.abc

Linked PRs

@rhettinger
Copy link
Contributor

rhettinger commented Apr 25, 2022

The general rule is that we can never add methods to ABCs once they are published. The purpose of an ABC is promise that a minimal set of methods are available. If isinstance(x, SomeABC) returns true, the methods in the ABC are expected to be present. Adding methods presents a problem for existing code that has registered a class as being compliant with SomeABC. If it lacks the new methods, then the registered promise is invalid.

Perhaps because this is an empty ABC that is almost entirely unused, this might be okay. On the other hand, because it is an empty ABC that is almost entirely unused, there is almost nothing to be gained by adding it. AFAICT no one has ever asked for or needed this since the ABC was added 15 years ago. That suggests that there is no problem to be solved here.

@rhettinger rhettinger self-assigned this Apr 25, 2022
@JelleZijlstra
Copy link
Member Author

Perhaps because this is an empty ABC that is almost entirely unused, this might be okay.

Right, the ABC is hardly useful with no methods. I searched on grep.app and found no uses of ByteString.register except for one of memoryview in what appears to be an old copy of typing.py.

On the other hand, because it is an empty ABC that is almost entirely unused, there is almost nothing to be gained by adding it. AFAICT no one has ever asked for or needed this since the ABC was added 15 years ago. That suggests that there is no problem to be solved here.

Well, I'm asking for it now. The use case is type annotating code that accepts both bytes and bytearray.

@rhettinger
Copy link
Contributor

rhettinger commented Apr 25, 2022

If you were to do it correctly, the procedure would be to subclass ByteString and add the new methods in a subclass. Loosely, this is similar to why we had to add IterableUserDict in Python 2 rather than modifying the existing UserDict.

To annotate code that accepts both bytes or bytearray wouldn't the correct way be to write: b: bytes | bytearray just like you would with tuple | list?

One other thought: the name ByteString wasn't very good to begin as is suggests bytes | str. If you really must do this, it would be better to create a new, well-named ABC with the requisite methods and even leave the old ABC around (it isn't hurting anything) or deprecate it.

@rhettinger
Copy link
Contributor

I forgot to mention that the docs promise something narrower that the bytes/bytearray API. It says, "ABCs for read-only and mutable sequences". There is no promise of the extra methods found in bytes or bytearray. Existing code reasonably by subclassing from ByteString and not providing or expecting any of the stringlike methods.

All around, I think this proposal is a contract violation and that a new ABC should be created. A scan on grep.app is insufficient to show this won't be a breaking change (most of the world's Python code isn't publicly visible).

@JelleZijlstra
Copy link
Member Author

I forgot to mention that the docs promise something narrower that the bytes/bytearray API. It says, "ABCs for read-only and mutable sequences". There is no promise of the extra methods found in bytes or bytearray. Existing code reasonably by subclassing from ByteString and not providing or expecting any of the stringlike methods.

That documentation is for Sequence, MutableSequence, and ByteString together (https://docs.python.org/3/library/collections.abc.html#collections.abc.Sequence). "Read-only and mutable sequences" is a good description for the first two, but the documentation really doesn't tell me what ByteString is good for.

ByteString is also documented at https://docs.python.org/3/library/typing.html#typing.ByteString, but that documentation has a couple of problems:

  • memoryview is not in fact registered as a ByteString
  • ByteString is not in fact generic (and it doesn't make sense for it to be generic)

All around, I think this proposal is a contract violation and that a new ABC should be created. A scan on grep.app is insufficient to show this won't be a breaking change (most of the world's Python code isn't publicly visible).

That's a reasonable point. If we can't use the existing ByteString ABC for bytes | bytearray, I don't think it's worth creating a new ABC—as you said above, the union annotation is good enough.

But if we keep ByteString as is, with no methods, I have no idea what it's useful for. We occasionally get people trying to use the ABC in type annotations, so the current state causes confusion.

Perhaps we could deprecate ByteString, or explicitly document its limited use.

@rhettinger
Copy link
Contributor

Perhaps we could deprecate ByteString, or explicitly document its limited use.

I vote for deprecation because the name is bad (implying bytes | str) and it would just be a continuing point of confusion.

@JelleZijlstra JelleZijlstra changed the title Give collections.abc.ByteString some methods Deprecate collections.abc.ByteString Apr 25, 2022
@serhiy-storchaka
Copy link
Member

The name is good to me. It implies the bytes-like object with str methods (find(), lower(), isspace(), etc).

The terms "buffer", "bytes-like", "bytestring" and "bytes string" are used loosely in the documentation, but there are several meanings of bytes-likeness:

  • Supports the buffer protocol.
  • Additionally supports len() which returns the size in bytes.
  • Additionally supports indexing.
  • Has most methods of str (except encode() of course).

Unfortunately there are no strongly defined terms and corresponding abstract classes, protocols or types in the code to express the requirements precisely.

@ankith26
Copy link

Hi! I'm one of the contributors to the pygame project.

I'm not sure whether this is the best best place to be asking, but it is relevant to the usage of ByteString.

So we have a function implemented with the python C API, and it uses y#, which according to the docs is a format string for a generic "bytes-like" sized object. The term "bytes-like" is defined here in the glossary which gives me the impression that any function using y# must accept a wide range of "byte-like" objects. Simply using bytes | bytearray would be narrow, and probably miss some kinds of objects. The same also mentions that "bytes-like object" is an object supporting the C level buffer protocol (which, is also not fully exposed on the python end in my understanding)

I was looking for a suitable ABC to typehint this, and the closest thing I could find that already exists is ByteString.
The next closest thing I found is typing.SupportsBytes which by the naming, gives me the impression that this is what I'm looking for, but weirdly enough, the concrete bytes object itself does not confirm to this protocol (due to missing the __bytes__ method)

I suppose a set of ABCs for the buffer protocol (if this were to be added) would be the closest replacement to ByteString, and would also work for my usecase (typing the C level y# arg format)

@TeamSpen210
Copy link

The problem with buffers is that it doesn't have a visible Python API, so the ABC would be really weird, being entirely empty. See previous discussion at python/typing#593 and then PEP 688.

hauntsaninja added a commit to hauntsaninja/cpython that referenced this issue Feb 21, 2023
Getting a DeprecationWarning on issubclass proved to be difficult,
because it could affect unrelated looking things like
`isinstance(bytes, Sequence)`
@rhettinger rhettinger removed their assignment Mar 17, 2023
hauntsaninja added a commit to hauntsaninja/cpython that referenced this issue Mar 25, 2023
hauntsaninja added a commit to hauntsaninja/cpython that referenced this issue Apr 28, 2023
AlexWaygood added a commit to hauntsaninja/cpython that referenced this issue May 4, 2023
hugovk added a commit to hauntsaninja/cpython that referenced this issue May 4, 2023
JelleZijlstra pushed a commit that referenced this issue May 4, 2023
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Hugo van Kemenade <[email protected]>
carljm added a commit to carljm/cpython that referenced this issue May 5, 2023
* main: (61 commits)
  pythongh-64595: Argument Clinic: Touch source file if any output file changed (python#104152)
  pythongh-64631: Test exception messages in cloned Argument Clinic funcs (python#104167)
  pythongh-68395: Avoid naming conflicts by mangling variable names in Argument Clinic (python#104065)
  pythongh-64658: Expand Argument Clinic return converter docs (python#104175)
  pythonGH-103092: port `_asyncio` freelist to module state (python#104196)
  pythongh-104051: fix crash in test_xxtestfuzz with -We (python#104052)
  pythongh-104190: fix ubsan crash (python#104191)
  pythongh-104106: Add gcc fallback of mkfifoat/mknodat for macOS (pythongh-104129)
  pythonGH-104142: Fix _Py_RefcntAdd to respect immortality (pythonGH-104143)
  pythongh-104112: link from cached_property docs to method-caching FAQ (python#104113)
  pythongh-68968: Correcting message display issue with assertEqual (python#103937)
  pythonGH-103899: Provide a hint when accidentally calling a module (pythonGH-103900)
  pythongh-103963: fix 'make regen-opcode' in out-of-tree builds (python#104177)
  pythongh-102500: Add PEP 688 and 698 to the 3.12 release highlights (python#104174)
  pythonGH-81079: Add case_sensitive argument to `pathlib.Path.glob()` (pythonGH-102710)
  pythongh-91896: Deprecate collections.abc.ByteString (python#102096)
  pythongh-99593: Add tests for Unicode C API (part 2) (python#99868)
  pythongh-102500: Document PEP 688 (python#102571)
  pythongh-102500: Implement PEP 688 (python#102521)
  pythongh-96534: socketmodule: support FreeBSD divert(4) socket (python#96536)
  ...
@hauntsaninja
Copy link
Contributor

This has been deprecated for 3.12

hauntsaninja added a commit to hauntsaninja/django-unicorn that referenced this issue May 8, 2023
This is an ABC that never really made much sense and was deprecated in python/cpython#91896
AlexWaygood pushed a commit that referenced this issue May 8, 2023
The bytes shorthand was removed in PEP 688:
https://peps.python.org/pep-0688/#no-special-meaning-for-bytes

I also remove the reference to `collections.abc.ByteString`, since that
object is deprecated (#91896) and has different semantics (#102092)
AlexWaygood pushed a commit to AlexWaygood/cpython that referenced this issue May 8, 2023
The bytes shorthand was removed in PEP 688:
https://peps.python.org/pep-0688/#no-special-meaning-for-bytes

I also remove the reference to `collections.abc.ByteString`, since that
object is deprecated (python#91896) and has different semantics (python#102092)
AlexWaygood added a commit that referenced this issue May 8, 2023
gh-102500: Remove mention of bytes shorthand (#104281)

The bytes shorthand was removed in PEP 688:
https://peps.python.org/pep-0688/#no-special-meaning-for-bytes

The reference to collections.abc.ByteString is also removed, since that object is deprecated (#91896) and has different semantics (#102092)

Although PEP 688 is new in Python 3.12, type checkers are expected to implement the new semantics for type annotations even if users are using an older version of Python, so this docs PR is backported to Python 3.11.

Co-authored-by: Shantanu <[email protected]>
AlexWaygood added a commit to AlexWaygood/cpython that referenced this issue May 8, 2023
jbower-fb pushed a commit to jbower-fb/cpython-jbowerfb that referenced this issue May 8, 2023
The bytes shorthand was removed in PEP 688:
https://peps.python.org/pep-0688/#no-special-meaning-for-bytes

I also remove the reference to `collections.abc.ByteString`, since that
object is deprecated (python#91896) and has different semantics (python#102092)
adamghill pushed a commit to adamghill/django-unicorn that referenced this issue May 9, 2023
This is an ABC that never really made much sense and was deprecated in python/cpython#91896
hauntsaninja added a commit to hauntsaninja/pycryptodome that referenced this issue May 9, 2023
typing.ByteString's behaviour was poorly specified. It is currently
scheduled for removal in Python 3.14.

See also python/cpython#91896
carljm added a commit to carljm/cpython that referenced this issue May 12, 2023
* main:
  pythongh-91896: Fixup some docs issues following ByteString deprecation (python#104422)
  pythonGH-104371: check return value of calling `mv.release` (python#104417)
  pythongh-104415: Fix refleak tests for `typing.ByteString` deprecation (python#104416)
  pythonGH-86275: Implementation of hypothesis stubs for property-based tests, with zoneinfo tests (python#22863)
  pythonGH-103082: Filter LINE events in VM, to simplify tool implementation. (pythonGH-104387)
  pythongh-93649: Split gc- and allocation tests from _testcapimodule.c (pythonGH-104403)
  pythongh-104389: Add 'unused' keyword to Argument Clinic C converters (python#104390)
  pythongh-101819: Prepare _io._IOBase for module state (python#104386)
  pythongh-104413: Fix refleak when super attribute throws AttributeError (python#104414)
  Fix refleak in `super_descr_get` (python#104408)
  pythongh-87526: Remove dead initialization from _zoneinfo parse_abbr() (python#24700)
  pythongh-91896: Improve visibility of `ByteString` deprecation warnings (python#104294)
  pythongh-104371: Fix calls to `__release_buffer__` while an exception is active (python#104378)
  pythongh-104377: fix cell in comprehension that is free in outer scope (python#104394)
  pythongh-104392: Remove _paramspec_tvars from typing (python#104393)
  pythongh-104396: uuid.py to skip platform check for emscripten and wasi (pythongh-104397)
  pythongh-99108: Refresh HACL* from upstream (python#104401)
  pythongh-104301: Allow leading whitespace in disambiguated pdb statements (python#104342)
carljm added a commit to carljm/cpython that referenced this issue May 15, 2023
* main: (29 commits)
  pythongh-101819: Fix _io clinic input for unused base class method stubs (python#104418)
  pythongh-101819: Isolate `_io` (python#101948)
  Bump mypy from 1.2.0 to 1.3.0 in /Tools/clinic (python#104501)
  pythongh-104494: Update certain Tkinter pack/place tests for Tk 8.7 errors (python#104495)
  pythongh-104050: Run mypy on `clinic.py` in CI (python#104421)
  pythongh-104490: Consistently define phony make targets (python#104491)
  pythongh-67056: document that registering/unregistering an atexit func from within an atexit func is undefined (python#104473)
  pythongh-104487: PYTHON_FOR_REGEN must be minimum Python 3.10 (python#104488)
  pythongh-101282: move BOLT config after PGO (pythongh-104493)
  pythongh-104469 Convert _testcapi/float.c to use AC (pythongh-104470)
  pythongh-104456: Fix ref leak in _ctypes.COMError (python#104457)
  pythongh-98539: Make _SSLTransportProtocol.abort() safe to call when closed (python#104474)
  pythongh-104337: Clarify random.gammavariate doc entry  (python#104410)
  Minor improvements to typing docs (python#104465)
  pythongh-87092: avoid gcc warning on uninitialized struct field in assemble.c (python#104460)
  pythonGH-71383: IDLE - Document testing subsets of modules (python#104463)
  pythongh-104454: Fix refleak in AttributeError_reduce (python#104455)
  pythongh-75710: IDLE - add docstrings and comments to editor module (python#104446)
  pythongh-91896: Revert some very noisy DeprecationWarnings for `ByteString` (python#104424)
  Add a mention of PYTHONBREAKPOINT to breakpoint() docs (python#104430)
  ...
carljm added a commit to carljm/cpython that referenced this issue May 15, 2023
* main: (204 commits)
  pythongh-101819: Fix _io clinic input for unused base class method stubs (python#104418)
  pythongh-101819: Isolate `_io` (python#101948)
  Bump mypy from 1.2.0 to 1.3.0 in /Tools/clinic (python#104501)
  pythongh-104494: Update certain Tkinter pack/place tests for Tk 8.7 errors (python#104495)
  pythongh-104050: Run mypy on `clinic.py` in CI (python#104421)
  pythongh-104490: Consistently define phony make targets (python#104491)
  pythongh-67056: document that registering/unregistering an atexit func from within an atexit func is undefined (python#104473)
  pythongh-104487: PYTHON_FOR_REGEN must be minimum Python 3.10 (python#104488)
  pythongh-101282: move BOLT config after PGO (pythongh-104493)
  pythongh-104469 Convert _testcapi/float.c to use AC (pythongh-104470)
  pythongh-104456: Fix ref leak in _ctypes.COMError (python#104457)
  pythongh-98539: Make _SSLTransportProtocol.abort() safe to call when closed (python#104474)
  pythongh-104337: Clarify random.gammavariate doc entry  (python#104410)
  Minor improvements to typing docs (python#104465)
  pythongh-87092: avoid gcc warning on uninitialized struct field in assemble.c (python#104460)
  pythonGH-71383: IDLE - Document testing subsets of modules (python#104463)
  pythongh-104454: Fix refleak in AttributeError_reduce (python#104455)
  pythongh-75710: IDLE - add docstrings and comments to editor module (python#104446)
  pythongh-91896: Revert some very noisy DeprecationWarnings for `ByteString` (python#104424)
  Add a mention of PYTHONBREAKPOINT to breakpoint() docs (python#104430)
  ...
@AlexWaygood AlexWaygood changed the title Deprecate collections.abc.ByteString Deprecate collections.abc.ByteString and typing.ByteString Jul 14, 2023
@AlexWaygood AlexWaygood changed the title Deprecate collections.abc.ByteString and typing.ByteString Deprecate and schedule removal of collections.abc.ByteString and typing.ByteString Jul 14, 2023
gertvdijk pushed a commit to gertvdijk/crc that referenced this issue Sep 18, 2023
Tested with mypy 1.5.1 on Python 3.11.

This changes `ByteString` into `Union[bytes, bytearray, memoryview]`.
See python/cpython#91896.

Python 3.11 documentation on `typing.ByteString`:

> Deprecated since version 3.9, will be removed in version 3.14: Prefer
> typing_extensions.Buffer, or a union like
> `bytes | bytearray | memoryview`.

Python 3.8 documentation on `typing.ByteString` [2]:

> This type represents the types `bytes`, `bytearray`, and `memoryview`
> of byte sequences.
>
> As a shorthand for this type, `bytes` can be used to annotate
> arguments of any of the types mentioned above.

[1]: https://docs.python.org/3.11/library/typing.html#typing.ByteString
[2]: https://docs.python.org/3.8/library/typing.html#typing.ByteString
gertvdijk added a commit to gertvdijk/crc that referenced this issue Sep 28, 2023
Tested with mypy 1.5.1 on Python 3.11.

This changes `ByteString` into `Union[bytes, bytearray, memoryview]`.
See python/cpython#91896.

Python 3.11 documentation on `typing.ByteString`:

> Deprecated since version 3.9, will be removed in version 3.14: Prefer
> typing_extensions.Buffer, or a union like
> `bytes | bytearray | memoryview`.

Python 3.8 documentation on `typing.ByteString` [2]:

> This type represents the types `bytes`, `bytearray`, and `memoryview`
> of byte sequences.
>
> As a shorthand for this type, `bytes` can be used to annotate
> arguments of any of the types mentioned above.

While at it, also add the return types for special methods which are
still optional in mypy strict mode (Ruff's ANN024 rule [3]).

[1]: https://docs.python.org/3.11/library/typing.html#typing.ByteString
[2]: https://docs.python.org/3.8/library/typing.html#typing.ByteString
[3]: https://docs.astral.sh/ruff/rules/missing-return-type-special-method/
gertvdijk added a commit to gertvdijk/crc that referenced this issue Sep 30, 2023
Tested with mypy 1.5.1 on Python 3.11.

This changes `ByteString` into `Union[bytes, bytearray, memoryview]`.
See python/cpython#91896.

Python 3.11 documentation on `typing.ByteString`:

> Deprecated since version 3.9, will be removed in version 3.14: Prefer
> typing_extensions.Buffer, or a union like
> `bytes | bytearray | memoryview`.

Python 3.8 documentation on `typing.ByteString` [2]:

> This type represents the types `bytes`, `bytearray`, and `memoryview`
> of byte sequences.
>
> As a shorthand for this type, `bytes` can be used to annotate
> arguments of any of the types mentioned above.

While at it, also add the return types for special methods which are
still optional in mypy strict mode (Ruff's ANN024 rule [3]).

[1]: https://docs.python.org/3.11/library/typing.html#typing.ByteString
[2]: https://docs.python.org/3.8/library/typing.html#typing.ByteString
[3]: https://docs.astral.sh/ruff/rules/missing-return-type-special-method/
gertvdijk added a commit to gertvdijk/crc that referenced this issue Sep 30, 2023
Tested with mypy 1.5.1 on Python 3.11.

This changes `ByteString` into `Union[bytes, bytearray, memoryview]`.
See python/cpython#91896.

Python 3.11 documentation on `typing.ByteString`:

> Deprecated since version 3.9, will be removed in version 3.14: Prefer
> typing_extensions.Buffer, or a union like
> `bytes | bytearray | memoryview`.

Python 3.8 documentation on `typing.ByteString` [2]:

> This type represents the types `bytes`, `bytearray`, and `memoryview`
> of byte sequences.
>
> As a shorthand for this type, `bytes` can be used to annotate
> arguments of any of the types mentioned above.

While at it, also add the return types for special methods which are
still optional in mypy strict mode (Ruff's ANN024 rule [3]).

[1]: https://docs.python.org/3.11/library/typing.html#typing.ByteString
[2]: https://docs.python.org/3.8/library/typing.html#typing.ByteString
[3]: https://docs.astral.sh/ruff/rules/missing-return-type-special-method/
Nicoretti pushed a commit to Nicoretti/crc that referenced this issue Oct 1, 2023
Tested with mypy 1.5.1 on Python 3.11.

This changes `ByteString` into `Union[bytes, bytearray, memoryview]`.
See python/cpython#91896.

Python 3.11 documentation on `typing.ByteString`:

> Deprecated since version 3.9, will be removed in version 3.14: Prefer
> typing_extensions.Buffer, or a union like
> `bytes | bytearray | memoryview`.

Python 3.8 documentation on `typing.ByteString` [2]:

> This type represents the types `bytes`, `bytearray`, and `memoryview`
> of byte sequences.
>
> As a shorthand for this type, `bytes` can be used to annotate
> arguments of any of the types mentioned above.

While at it, also add the return types for special methods which are
still optional in mypy strict mode (Ruff's ANN024 rule [3]).

[1]: https://docs.python.org/3.11/library/typing.html#typing.ByteString
[2]: https://docs.python.org/3.8/library/typing.html#typing.ByteString
[3]: https://docs.astral.sh/ruff/rules/missing-return-type-special-method/
will9288 added a commit to will9288/django-unicorn that referenced this issue May 6, 2024
This is an ABC that never really made much sense and was deprecated in python/cpython#91896
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-typing type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants