Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 670: Convert _PyObject_SIZE() and _PyObject_VAR_SIZE() macros to functions #99845

Closed
vstinner opened this issue Nov 28, 2022 · 7 comments
Closed
Labels
type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

vstinner commented Nov 28, 2022

The _PyObject_SIZE() and _PyObject_VAR_SIZE() macros should be converted to functions: see PEP 670 for the rationale.

My problem is that I don't know if the return type should be signed (Py_ssize_t) or unsigned (size_t).

CPython usage of _PyObject_SIZE():

  • Signed: 18. Implementation of __sizeof__() methods.
  • Unsigned: 0
  • Implicit cast to unsigned: 2. Calls PyObject_Malloc(_PyObject_SIZE(tp)) and gc_alloc(_PyObject_SIZE(tp), presize) where the first argument type is size_t.

CPython usage of _PyObject_VAR_SIZE():

  • Signed: 5
  • Unsigned: 1
  • Implicit cast to unsigned: 1. Call _PyDebugAllocatorStats(..., _PyObject_VAR_SIZE(&PyTuple_Type, len)) where the argument type is size_t.

To get a container length, the C API uses signed type (Py_ssize_t): PyList_Size(), PyDict_Size(), Py_SIZE(), etc.

To allocate memory, the C API prefers unsigned type (size_t): PyMem_Malloc(), PyObject_Realloc(), etc.

Python allocator functions reject size greater than PY_SSIZE_T_MAX:

void *
PyMem_RawMalloc(size_t size)
{
    /*
     * Limit ourselves to PY_SSIZE_T_MAX bytes to prevent security holes.
     * Most python internals blindly use a signed Py_ssize_t to track
     * things without checking for overflows or negatives.
     * As size_t is unsigned, checking for size < 0 is not required.
     */
    if (size > (size_t)PY_SSIZE_T_MAX)
        return NULL;
    return _PyMem_Raw.malloc(_PyMem_Raw.ctx, size);
}

Some "sizeof" functions freely mix signed and unsigned types. Example:

static PyObject *
deque_sizeof(dequeobject *deque, void *unused)
{
    Py_ssize_t res;
    Py_ssize_t blocks;

    res = _PyObject_SIZE(Py_TYPE(deque));
    blocks = (size_t)(deque->leftindex + Py_SIZE(deque) + BLOCKLEN - 1) / BLOCKLEN;
    assert(deque->leftindex + Py_SIZE(deque) - 1 ==
           (blocks - 1) * BLOCKLEN + deque->rightindex);
    res += blocks * sizeof(block);
    return PyLong_FromSsize_t(res);
}

blocks and sizeof(block) are unsigned, but res is signed.


Another problem is that _PyObject_VAR_SIZE() has an undefined behavior on integer overflow. Maybe it should return SIZE_MAX on oveflow, to make sure that Python allocator function fail (return NULL)?

Linked PRs

@vstinner vstinner added the type-bug An unexpected behavior, bug, or error label Nov 28, 2022
@vstinner
Copy link
Member Author

Aha, _PySys_GetSizeOf() returns size_t, but bytesio_sizeof() stores its result in a Py_ssize_t variable.

@vstinner
Copy link
Member Author

@corona10 @erlend-aasland @serhiy-storchaka: Are you ok to use an unsigned type for _PyObject_SIZE() and _PyObject_VAR_SIZE() result? See PR #99846 and PR #99847 for concrete changes to prepare converting these macros to functions.

@vstinner
Copy link
Member Author

vstinner commented Nov 28, 2022

In PyPI top 5000 projects, _PyObject_SIZE() and _PyObject_VAR_SIZE() are used with signed and unsigned types.

numpy uses unsigned:

numpy-1.23.2/numpy/core/src/multiarray/scalartypes.c.src: const size_t size = _PyObject_VAR_SIZE(type, nitems + 1);

Cython gdb plugin treats _PyObject_VAR_SIZE() macro result as unsigned:

Cython-0.29.32/Cython/Debugger/libpython.py: def _PyObject_VAR_SIZE(typeobj, nitems):
Cython-0.29.32/Cython/Debugger/libpython.py: if _PyObject_VAR_SIZE._type_size_t is None:
Cython-0.29.32/Cython/Debugger/libpython.py: _PyObject_VAR_SIZE._type_size_t = gdb.lookup_type('size_t')
Cython-0.29.32/Cython/Debugger/libpython.py: ).cast(_PyObject_VAR_SIZE._type_size_t)
Cython-0.29.32/Cython/Debugger/libpython.py: _PyObject_VAR_SIZE._type_size_t = None
Cython-0.29.32/Cython/Debugger/libpython.py: size = _PyObject_VAR_SIZE(typeobj, tsize)

JPype uses signed and unsigned types:

JPype1-1.4.0/native/python/pyjp_module.cpp: long v = _PyObject_VAR_SIZE(type, 1)+(PyJPValue_hasJavaSlot(type)?sizeof (JPValue):0);
JPype1-1.4.0/native/python/pyjp_value.cpp: const size_t size = _PyObject_VAR_SIZE(type, nitems + 1) + sizeof (JPValue);

	Py_ssize_t offset;
JPype1-1.4.0/native/python/pyjp_value.cpp: offset = _PyObject_VAR_SIZE(type, 1);

	Py_ssize_t offset;
JPype1-1.4.0/native/python/pyjp_value.cpp: offset = _PyObject_VAR_SIZE(type, sz + 1);

frozendict uses signed type (copy of CPython Objects/dictobject.c which also uses signed types currently):

    Py_ssize_t size, usable, res;
frozendict-2.3.4/frozendict/src/3_10/cpython_src/Objects/dictobject_original.c: res = _PyObject_SIZE(Py_TYPE(mp));

    Py_ssize_t size, usable, res;
frozendict-2.3.4/frozendict/src/3_6/cpython_src/Objects/dictobject.c: res = _PyObject_SIZE(Py_TYPE(mp));
frozendict-2.3.4/frozendict/src/3_6/cpython_src/Objects/dictobject_original.c: res = _PyObject_SIZE(Py_TYPE(mp));
frozendict-2.3.4/frozendict/src/3_7/cpython_src/Objects/dictobject_original.c: res = _PyObject_SIZE(Py_TYPE(mp));
frozendict-2.3.4/frozendict/src/3_8/cpython_src/Objects/dictobject_original.c: res = _PyObject_SIZE(Py_TYPE(mp));
frozendict-2.3.4/frozendict/src/3_9/cpython_src/Objects/dictobject_original.c: res = _PyObject_SIZE(Py_TYPE(mp));

pickle uses signed:

    Py_ssize_t res, s;
pickle5-0.0.12/pickle5/_pickle.c: res = _PyObject_SIZE(Py_TYPE(self));

    Py_ssize_t res;
pickle5-0.0.12/pickle5/_pickle.c: res = _PyObject_SIZE(Py_TYPE(self));

recordclass uses signed:

recordclass-0.17.5/lib/recordclass/_dataobject.c: // Py_ssize_t size = _PyObject_SIZE(type);
recordclass-0.17.5/lib/recordclass/_litetuple.c: Py_ssize_t size = _PyObject_VAR_SIZE(tp, nitems);

Nuitka uses unsigned:

    size_t size;
Nuitka-1.0.6/nuitka/build/static_src/HelpersCalling2.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersCalling2.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersCalling2.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersCalling2.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersCalling2.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersCalling2.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersCalling2.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersCalling2.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersCalling2.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersCalling2.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersCalling2.c: size = _PyObject_VAR_SIZE(type, tsize);

    size_t size;
Nuitka-1.0.6/nuitka/build/static_src/HelpersAttributes.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersAttributes.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersAttributes.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersAttributes.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/static_src/HelpersAttributes.c: size = _PyObject_VAR_SIZE(type, tsize);
Nuitka-1.0.6/nuitka/build/include/nuitka/allocator.h: size_t size = _PyObject_VAR_SIZE(tp, nitems);

    size_t size;
Nuitka-1.0.6/nuitka/code_generation/templates_c/CodeTemplateCallsMethodPositional.c.j2: size = _PyObject_VAR_SIZE(type, tsize);

pyobjc-core casts implicitly _PyObject_SIZE() result to unsigned (size_t):

pyobjc-core-8.5/Modules/objc/varlist.m: result = (PyObjC_VarList*)PyObject_Malloc(_PyObject_SIZE(&PyObjC_VarList_Type)

@vstinner
Copy link
Member Author

I created PR #99848 for Objects/dictobject.c.

@vstinner
Copy link
Member Author

The final change should be PR #99850 which convert the two macros to functions.

vstinner added a commit that referenced this issue Nov 29, 2022
* Change _PyDict_KeysSize() and shared_keys_usable_size() return type
  from signed (Py_ssize_t) to unsigned (size_t) type.
* new_values() argument type is now unsigned (size_t).
* init_inline_values() now uses size_t rather than int for the 'i'
  iterator variable.
* type.__sizeof__() implementation now uses unsigned (size_t) type.
vstinner added a commit that referenced this issue Nov 29, 2022
* code_sizeof() now uses an unsigned type (size_t) to compute the result.
* Fix _PyObject_ComputedDictPointer(): cast _PyObject_VAR_SIZE() to
  Py_ssize_t, rather than long: it's a different type on 64-bit Windows.
* Clarify that _PyObject_VAR_SIZE() uses an unsigned type (size_t).
@vstinner
Copy link
Member Author

I was confused by the "SIZE" name:

  • Py_SIZE() returns the number of elements in a container like a list or a dict: it has no unit.
  • _PyObject_SIZE() is different: it returns an object size in bytes: it has an unit.

IMO for bytes, it's better to use unsigned numbers. In the Python API, it's not an issue, it's an object. For example, _PySys_GetSizeOf() and _PyType_PreHeaderSize() return types are unsigned (size_t).

vstinner added a commit that referenced this issue Nov 30, 2022
The implementation of __sizeof__() methods using _PyObject_SIZE() now
use an unsigned type (size_t) to compute the size, rather than a signed
type (Py_ssize_t).

Cast explicitly signed (Py_ssize_t) values to unsigned type
(Py_ssize_t).
vstinner added a commit that referenced this issue Nov 30, 2022
Convert macros to static inline functions to avoid macro pitfalls,
like duplication of side effects:

* _PyObject_SIZE()
* _PyObject_VAR_SIZE()

The result type is size_t (unsigned).
vstinner added a commit that referenced this issue Dec 1, 2022
Cast size_t to Py_ssize_t, rather than casting it to long. On 64-bit
Windows, long is 32-bit whereas Py_ssize_t is 64-bit.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Dec 1, 2022
…H-99922)

Cast size_t to Py_ssize_t, rather than casting it to long. On 64-bit
Windows, long is 32-bit whereas Py_ssize_t is 64-bit.
(cherry picked from commit 9707bf2)

Co-authored-by: Victor Stinner <[email protected]>
miss-islington added a commit that referenced this issue Dec 1, 2022
Cast size_t to Py_ssize_t, rather than casting it to long. On 64-bit
Windows, long is 32-bit whereas Py_ssize_t is 64-bit.
(cherry picked from commit 9707bf2)

Co-authored-by: Victor Stinner <[email protected]>
carljm added a commit to carljm/cpython that referenced this issue Dec 1, 2022
* main: (112 commits)
  pythongh-99894: Ensure the local names don't collide with the test file in traceback suggestion error checking (python#99895)
  pythongh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (pythonGH-99613)
  Doc: Add summary line to isolation_level & autocommit sqlite3.connect params (python#99917)
  pythonGH-98906 ```re``` module: ```search() vs. match()``` section should mention ```fullmatch()``` (pythonGH-98916)
  pythongh-89189: More compact range iterator (pythonGH-27986)
  bpo-47220: Document the optional callback parameter of weakref.WeakMethod (pythonGH-25491)
  pythonGH-99905: Fix output of misses in summarize_stats.py execution counts (pythonGH-99906)
  pythongh-99845: PEP 670: Convert PyObject macros to functions (python#99850)
  pythongh-99845: Use size_t type in __sizeof__() methods (python#99846)
  pythonGH-99877)
  Fix typo in exception message in `multiprocessing.pool` (python#99900)
  pythongh-87092: move all localsplus preparation into separate function called from assembler stage (pythonGH-99869)
  pythongh-99891: Fix infinite recursion in the tokenizer when showing warnings (pythonGH-99893)
  pythongh-99824: Document that sqlite3.connect implicitly open a transaction if autocommit=False (python#99825)
  pythonGH-81057: remove static state from suggestions.c (python#99411)
  Improve zip64 limit error message (python#95892)
  pythongh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache (python#98591)
  pythongh-99127: Allow some features of syslog to the main interpreter only (pythongh-99128)
  pythongh-82836: fix private network check (python#97733)
  Docs: improve accuracy of socketserver reference (python#24767)
  ...
@vstinner
Copy link
Member Author

vstinner commented Dec 5, 2022

Fixed by 131801d

@vstinner vstinner closed this as completed Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant