Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-101291: Rearrange the size bits in PyLongObject #102464

Merged
merged 37 commits into from
Mar 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
0ec07e4
Add functions to hide some internals of long object.
markshannon Jan 25, 2023
292b9d0
Add internal functions to longobject.c for setting sign and digit count.
markshannon Jan 25, 2023
5c54894
Replace Py_SIZE(x) < 0 with _PyLong_IsNegative(x) in longobject.c
markshannon Feb 28, 2023
029aaa4
Replace Py_ABS(Py_SIZE(a)) with _PyLong_DigitCount(a) in longobject.c
markshannon Feb 28, 2023
b56e6da
Remove many uses of Py_SIZE in longobject.c
markshannon Feb 28, 2023
91269fc
Remove _PyLong_AssignValue, as it is no longer used.
markshannon Feb 28, 2023
c48e825
Remove some more uses of Py_SIZE in longobject.c.
markshannon Feb 28, 2023
449c0e2
Remove a few more uses of Py_SIZE in longobject.c.
markshannon Mar 1, 2023
c5ba601
Remove some more uses of Py_SIZE, replacing with _PyLong_UnsignedDigi…
markshannon Mar 1, 2023
4b3a3e8
Replace a few Py_SIZE() with _PyLong_SameSign().
markshannon Mar 1, 2023
9ef9d2c
Remove a few more Py_SIZE() from longobject.c
markshannon Mar 1, 2023
9c408c1
Replace uses of IS_MEDIUM_VALUE macro with _PyLong_IsSingleDigit.
markshannon Mar 1, 2023
548d656
Remove most of the remaining uses of Py_SIZE in longobject.c
markshannon Mar 1, 2023
3e3fefd
Replace last remaining uses of Py_SIZE applied to longobject with _Py…
markshannon Mar 1, 2023
391fb51
Don't use _PyObject_InitVar and move a couple of inline functions to …
markshannon Mar 1, 2023
df8c7d3
Correct name of inline function.
markshannon Mar 1, 2023
bc14fa6
Eliminate all remaining uses of Py_SIZE and Py_SET_SIZE on PyLongObject.
markshannon Mar 1, 2023
54c6f1b
Change layout of size/sign bits in longobject to support future addit…
markshannon Mar 2, 2023
ce6bfb2
Test pairs of longs together on fast path of add/mul/sub.
markshannon Mar 2, 2023
4c1956b
Tidy up comment and delete commented out code.
markshannon Mar 6, 2023
301158b
Add news.
markshannon Mar 6, 2023
1aa1891
Remove debugging asserts.
markshannon Mar 6, 2023
bf2a9af
Fix storage classes.
markshannon Mar 6, 2023
169f521
Remove development debug functions.
markshannon Mar 6, 2023
90f9072
Avoid casting to smaller int.
markshannon Mar 8, 2023
f143443
Apply suggestions from code review.
markshannon Mar 8, 2023
a0d661e
Widen types to avoid data loss.
markshannon Mar 8, 2023
145a2e4
Fix syntax error.
markshannon Mar 8, 2023
638a98f
Replace 'SingleDigit' with 'Compact' as the term 'single digit' seems…
markshannon Mar 9, 2023
7f5acc0
Address review comments.
markshannon Mar 16, 2023
b06bb6f
Merge branch 'main' into long-rearrange-size-bits
markshannon Mar 16, 2023
a19b0a7
Merge branch 'main' into long-rearrange-size-bits
markshannon Mar 16, 2023
87f49b2
Fix _PyLong_Sign
markshannon Mar 16, 2023
f764aa8
Replace _PyLong_Sign(x) < 0 with _PyLong_IsNegative(x).
markshannon Mar 16, 2023
9843ac0
fix sign check
markshannon Mar 16, 2023
d6cb917
Address some review comments.
markshannon Mar 22, 2023
469d26f
Change asserts on digit counts to asserts on sign where applicable.
markshannon Mar 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Include/cpython/longintrepr.h
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ typedef long stwodigits; /* signed variant of twodigits */
*/
Copy link
Contributor

@verhovsky verhovsky Apr 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't update this comment that documents _longobject, it's still talking about ob_size and PyVarObject

/* Long integer representation.
   The absolute value of a number is equal to
        SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)


typedef struct _PyLongValue {
Py_ssize_t ob_size; /* Number of items in variable part */
uintptr_t lv_tag; /* Number of digits, sign and flags */
digit ob_digit[1];
} _PyLongValue;

Expand All @@ -94,6 +94,10 @@ PyAPI_FUNC(PyLongObject *) _PyLong_New(Py_ssize_t);
/* Return a copy of src. */
PyAPI_FUNC(PyObject *) _PyLong_Copy(PyLongObject *src);

PyAPI_FUNC(PyLongObject *)
_PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits);


#ifdef __cplusplus
}
#endif
Expand Down
164 changes: 146 additions & 18 deletions Include/internal/pycore_long.h
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,6 @@ PyObject *_PyLong_Add(PyLongObject *left, PyLongObject *right);
PyObject *_PyLong_Multiply(PyLongObject *left, PyLongObject *right);
PyObject *_PyLong_Subtract(PyLongObject *left, PyLongObject *right);

int _PyLong_AssignValue(PyObject **target, Py_ssize_t value);

/* Used by Python/mystrtoul.c, _PyBytes_FromHex(),
_PyBytes_DecodeEscape(), etc. */
PyAPI_DATA(unsigned char) _PyLong_DigitValue[256];
Expand All @@ -110,25 +108,155 @@ PyAPI_FUNC(char*) _PyLong_FormatBytesWriter(
int base,
int alternate);

/* Return 1 if the argument is positive single digit int */
/* Long value tag bits:
* 0-1: Sign bits value = (1-sign), ie. negative=2, positive=0, zero=1.
* 2: Reserved for immortality bit
Comment on lines +112 to +113
Copy link
Contributor

@eduardo-elizondo eduardo-elizondo Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need an immortality flag here, but we do need a static flag (immortality should be marked by the refcount and this marks if the object is static or not. Using this, we can do the static check at dealloc time to prevent the deallocation of the objects

* 3+ Unsigned digit count
*/
#define SIGN_MASK 3
#define SIGN_ZERO 1
#define SIGN_NEGATIVE 2
#define NON_SIZE_BITS 3

/* All *compact" values are guaranteed to fit into
* a Py_ssize_t with at least one bit to spare.
* In other words, for 64 bit machines, compact
* will be signed 63 (or fewer) bit values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add that compact values have at most one digit? I've seen some code depending on that (e.g. _PyLong_Multiply).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with tagged ints. In theory a compact int could have 5 digits. (63 bit compact ints, and 15 bit digits).

For a sensible implementation, a compact int will be one or two digits.

*/

/* Return 1 if the argument is compact int */
static inline int
_PyLong_IsNonNegativeCompact(const PyLongObject* op) {
assert(PyLong_Check(op));
return op->long_value.lv_tag <= (1 << NON_SIZE_BITS);
gvanrossum marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

@eduardo-elizondo eduardo-elizondo Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work if we set the second (immortal/static) bit, i.e: the immortal small int 1 since it will have an lv_tag of 1100 and return an incorrect value here.

I'll create a new PR to restructure this a bit to make it work with the new bit flag.

cc @ericsnowcurrently

}

static inline int
_PyLong_IsCompact(const PyLongObject* op) {
assert(PyLong_Check(op));
return op->long_value.lv_tag < (2 << NON_SIZE_BITS);
}

static inline int
_PyLong_IsPositiveSingleDigit(PyObject* sub) {
/* For a positive single digit int, the value of Py_SIZE(sub) is 0 or 1.

We perform a fast check using a single comparison by casting from int
to uint which casts negative numbers to large positive numbers.
For details see Section 14.2 "Bounds Checking" in the Agner Fog
optimization manual found at:
https://www.agner.org/optimize/optimizing_cpp.pdf

The function is not affected by -fwrapv, -fno-wrapv and -ftrapv
compiler options of GCC and clang
*/
assert(PyLong_CheckExact(sub));
Py_ssize_t signed_size = Py_SIZE(sub);
return ((size_t)signed_size) <= 1;
_PyLong_BothAreCompact(const PyLongObject* a, const PyLongObject* b) {
assert(PyLong_Check(a));
assert(PyLong_Check(b));
return (a->long_value.lv_tag | b->long_value.lv_tag) < (2 << NON_SIZE_BITS);
}

/* Returns a *compact* value, iff `_PyLong_IsCompact` is true for `op`.
*
* "Compact" values have at least one bit to spare,
* so that addition and subtraction can be performed on the values
* without risk of overflow.
*/
static inline Py_ssize_t
_PyLong_CompactValue(const PyLongObject *op)
{
assert(PyLong_Check(op));
assert(_PyLong_IsCompact(op));
Py_ssize_t sign = 1 - (op->long_value.lv_tag & SIGN_MASK);
return sign * (Py_ssize_t)op->long_value.ob_digit[0];
}

static inline bool
_PyLong_IsZero(const PyLongObject *op)
{
return (op->long_value.lv_tag & SIGN_MASK) == SIGN_ZERO;
}

static inline bool
_PyLong_IsNegative(const PyLongObject *op)
{
return (op->long_value.lv_tag & SIGN_MASK) == SIGN_NEGATIVE;
}

static inline bool
_PyLong_IsPositive(const PyLongObject *op)
{
return (op->long_value.lv_tag & SIGN_MASK) == 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have #define SIGN_POSITIVE 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want these functions to be the only way to determine the sign.
Defining SIGN_POSITIVE will just encourage people to do the test elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fine. Next question: maybe we also need a _PyLong_IsNonZero? I see !_PyLong_IsZero a lot, and the ! is easily missed. (Or maybe that's just my old eyes.) Possibly also IsNonNegative and IsNonPositive.

}

static inline Py_ssize_t
_PyLong_DigitCount(const PyLongObject *op)
{
assert(PyLong_Check(op));
return op->long_value.lv_tag >> NON_SIZE_BITS;
}

/* Equivalent to _PyLong_DigitCount(op) * _PyLong_NonCompactSign(op) */
static inline Py_ssize_t
_PyLong_SignedDigitCount(const PyLongObject *op)
{
assert(PyLong_Check(op));
Py_ssize_t sign = 1 - (op->long_value.lv_tag & SIGN_MASK);
return sign * (Py_ssize_t)(op->long_value.lv_tag >> NON_SIZE_BITS);
}

static inline int
_PyLong_CompactSign(const PyLongObject *op)
{
assert(PyLong_Check(op));
assert(_PyLong_IsCompact(op));
return 1 - (op->long_value.lv_tag & SIGN_MASK);
}

Comment on lines +196 to +203
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be the new implementation of _PyLong_Sign(), if _PyLong_NonCompactSign() is removed? This gets rid of a branch in the proposed version of _PyLong_Sign().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They sure look identical to me. Maybe Mark has plans and maybe the compiler would optimize this anyway?

if (P(x))
  return F(x);
else
  return F(x);

could just become return F(x);.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the freedom to implement the "compact" and non-compact forms differently.
They have the same implementation at the moment, but that will change.

_PyLong_Sign() is part of the ABI, so we need to retain it. But almost all code using _PyLong_Sign() actually wants to know if an int is negative and should be using _PyLong_IsNegative().

static inline int
_PyLong_NonCompactSign(const PyLongObject *op)
{
assert(PyLong_Check(op));
assert(!_PyLong_IsCompact(op));
return 1 - (op->long_value.lv_tag & SIGN_MASK);
}

/* Do a and b have the same sign? */
static inline int
_PyLong_SameSign(const PyLongObject *a, const PyLongObject *b)
{
return (a->long_value.lv_tag & SIGN_MASK) == (b->long_value.lv_tag & SIGN_MASK);
}

#define TAG_FROM_SIGN_AND_SIZE(sign, size) ((1 - (sign)) | ((size) << NON_SIZE_BITS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size should be cast to size_t before shifting, and the result cast to Py_ssize_t to avoid UB.

I also haven't checked the assembly here, but I don't really know what happens when OR-ing a signed 64-bit int with a signed 32-bit int, and if this is doing work that's not strictly necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only in _PyLong_SetSignAndSize that size is a variable. I'll do the conversion there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe add a comment that this macro should only be used with literal or size_t arguments?


static inline void
_PyLong_SetSignAndDigitCount(PyLongObject *op, int sign, Py_ssize_t size)
{
assert(size >= 0);
assert(-1 <= sign && sign <= 1);
assert(sign != 0 || size == 0);
op->long_value.lv_tag = TAG_FROM_SIGN_AND_SIZE(sign, (size_t)size);
}

static inline void
_PyLong_SetDigitCount(PyLongObject *op, Py_ssize_t size)
{
assert(size >= 0);
op->long_value.lv_tag = (((size_t)size) << NON_SIZE_BITS) | (op->long_value.lv_tag & SIGN_MASK);
}

#define NON_SIZE_MASK ~((1 << NON_SIZE_BITS) - 1)

static inline void
_PyLong_FlipSign(PyLongObject *op) {
unsigned int flipped_sign = 2 - (op->long_value.lv_tag & SIGN_MASK);
op->long_value.lv_tag &= NON_SIZE_MASK;
op->long_value.lv_tag |= flipped_sign;
}

#define _PyLong_DIGIT_INIT(val) \
{ \
.ob_base = _PyObject_IMMORTAL_INIT(&PyLong_Type), \
.long_value = { \
.lv_tag = TAG_FROM_SIGN_AND_SIZE( \
(val) == 0 ? 0 : ((val) < 0 ? -1 : 1), \
(val) == 0 ? 0 : 1), \
{ ((val) >= 0 ? (val) : -(val)) }, \
} \
}

#define _PyLong_FALSE_TAG TAG_FROM_SIGN_AND_SIZE(0, 0)
#define _PyLong_TRUE_TAG TAG_FROM_SIGN_AND_SIZE(1, 1)

#ifdef __cplusplus
}
#endif
Expand Down
3 changes: 2 additions & 1 deletion Include/internal/pycore_object.h
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,9 @@ static inline void
_PyObject_InitVar(PyVarObject *op, PyTypeObject *typeobj, Py_ssize_t size)
{
assert(op != NULL);
Py_SET_SIZE(op, size);
assert(typeobj != &PyLong_Type);
_PyObject_Init((PyObject *)op, typeobj);
Py_SET_SIZE(op, size);
}


Expand Down
10 changes: 1 addition & 9 deletions Include/internal/pycore_runtime_init.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ extern "C" {
# error "this header requires Py_BUILD_CORE define"
#endif

#include "pycore_long.h"
#include "pycore_object.h"
#include "pycore_parser.h"
#include "pycore_pymem_init.h"
Expand Down Expand Up @@ -130,15 +131,6 @@ extern PyTypeObject _PyExc_MemoryError;

// global objects

#define _PyLong_DIGIT_INIT(val) \
{ \
.ob_base = _PyObject_IMMORTAL_INIT(&PyLong_Type), \
.long_value = { \
((val) == 0 ? 0 : ((val) > 0 ? 1 : -1)), \
{ ((val) >= 0 ? (val) : -(val)) }, \
} \
}

#define _PyBytes_SIMPLE_INIT(CH, LEN) \
{ \
_PyVarObject_IMMORTAL_INIT(&PyBytes_Type, (LEN)), \
Expand Down
8 changes: 7 additions & 1 deletion Include/object.h
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,13 @@ static inline PyTypeObject* Py_TYPE(PyObject *ob) {
# define Py_TYPE(ob) Py_TYPE(_PyObject_CAST(ob))
#endif

PyAPI_DATA(PyTypeObject) PyLong_Type;
PyAPI_DATA(PyTypeObject) PyBool_Type;

// bpo-39573: The Py_SET_SIZE() function must be used to set an object size.
static inline Py_ssize_t Py_SIZE(PyObject *ob) {
assert(ob->ob_type != &PyLong_Type);
assert(ob->ob_type != &PyBool_Type);
PyVarObject *var_ob = _PyVarObject_CAST(ob);
return var_ob->ob_size;
}
Expand Down Expand Up @@ -171,8 +176,9 @@ static inline void Py_SET_TYPE(PyObject *ob, PyTypeObject *type) {
# define Py_SET_TYPE(ob, type) Py_SET_TYPE(_PyObject_CAST(ob), type)
#endif


static inline void Py_SET_SIZE(PyVarObject *ob, Py_ssize_t size) {
assert(ob->ob_base.ob_type != &PyLong_Type);
assert(ob->ob_base.ob_type != &PyBool_Type);
ob->ob_size = size;
}
#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 < 0x030b0000
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Rearrage bits in first field (after header) of PyLongObject.
* Bits 0 and 1: 1 - sign. I.e. 0 for positive numbers, 1 for zero and 2 for negative numbers.
* Bit 2 reserved (probably for the immortal bit)
* Bits 3+ the unsigned size.

This makes a few operations slightly more efficient, and will enable a more
compact and faster 2s-complement representation of most ints in future.
43 changes: 8 additions & 35 deletions Modules/_decimal/_decimal.c
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#endif

#include <Python.h>
#include "pycore_long.h" // _PyLong_IsZero()
#include "pycore_pystate.h" // _PyThreadState_GET()
#include "complexobject.h"
#include "mpdecimal.h"
Expand Down Expand Up @@ -2146,35 +2147,25 @@ dec_from_long(PyTypeObject *type, PyObject *v,
{
PyObject *dec;
PyLongObject *l = (PyLongObject *)v;
Py_ssize_t ob_size;
size_t len;
uint8_t sign;

dec = PyDecType_New(type);
if (dec == NULL) {
return NULL;
}

ob_size = Py_SIZE(l);
if (ob_size == 0) {
if (_PyLong_IsZero(l)) {
_dec_settriple(dec, MPD_POS, 0, 0);
return dec;
}

if (ob_size < 0) {
len = -ob_size;
sign = MPD_NEG;
}
else {
len = ob_size;
sign = MPD_POS;
}
uint8_t sign = _PyLong_IsNegative(l) ? MPD_NEG : MPD_POS;

if (len == 1) {
_dec_settriple(dec, sign, *l->long_value.ob_digit, 0);
if (_PyLong_IsCompact(l)) {
_dec_settriple(dec, sign, l->long_value.ob_digit[0], 0);
mpd_qfinalize(MPD(dec), ctx, status);
return dec;
}
size_t len = _PyLong_DigitCount(l);

#if PYLONG_BITS_IN_DIGIT == 30
mpd_qimport_u32(MPD(dec), l->long_value.ob_digit, len, sign, PyLong_BASE,
Expand Down Expand Up @@ -3482,7 +3473,6 @@ dec_as_long(PyObject *dec, PyObject *context, int round)
PyLongObject *pylong;
digit *ob_digit;
size_t n;
Py_ssize_t i;
mpd_t *x;
mpd_context_t workctx;
uint32_t status = 0;
Expand Down Expand Up @@ -3536,26 +3526,9 @@ dec_as_long(PyObject *dec, PyObject *context, int round)
}

assert(n > 0);
pylong = _PyLong_New(n);
if (pylong == NULL) {
mpd_free(ob_digit);
mpd_del(x);
return NULL;
}

memcpy(pylong->long_value.ob_digit, ob_digit, n * sizeof(digit));
assert(!mpd_iszero(x));
pylong = _PyLong_FromDigits(mpd_isnegative(x), n, ob_digit);
mpd_free(ob_digit);

i = n;
while ((i > 0) && (pylong->long_value.ob_digit[i-1] == 0)) {
i--;
}

Py_SET_SIZE(pylong, i);
if (mpd_isnegative(x) && !mpd_iszero(x)) {
Py_SET_SIZE(pylong, -i);
}

mpd_del(x);
return (PyObject *) pylong;
}
Expand Down
2 changes: 1 addition & 1 deletion Modules/_testcapi/mem.c
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ test_pyobject_new(PyObject *self, PyObject *Py_UNUSED(ignored))
{
PyObject *obj;
PyTypeObject *type = &PyBaseObject_Type;
PyTypeObject *var_type = &PyLong_Type;
PyTypeObject *var_type = &PyBytes_Type;

// PyObject_New()
obj = PyObject_New(PyObject, type);
Expand Down
7 changes: 5 additions & 2 deletions Modules/_tkinter.c
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ Copyright (C) 1994 Steen Lumholt.
# include "pycore_fileutils.h" // _Py_stat()
#endif

#include "pycore_long.h"

#ifdef MS_WINDOWS
#include <windows.h>
#endif
Expand Down Expand Up @@ -886,7 +888,8 @@ asBignumObj(PyObject *value)
const char *hexchars;
mp_int bigValue;

neg = Py_SIZE(value) < 0;
assert(PyLong_Check(value));
neg = _PyLong_IsNegative((PyLongObject *)value);
Comment on lines +891 to +892
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put the blank line back.

Suggested change
assert(PyLong_Check(value));
neg = _PyLong_IsNegative((PyLongObject *)value);
assert(PyLong_Check(value));
neg = _PyLong_IsNegative((PyLongObject *)value);

hexstr = _PyLong_Format(value, 16);
if (hexstr == NULL)
return NULL;
Expand Down Expand Up @@ -1950,7 +1953,7 @@ _tkinter_tkapp_getboolean(TkappObject *self, PyObject *arg)
int v;

if (PyLong_Check(arg)) { /* int or bool */
return PyBool_FromLong(Py_SIZE(arg) != 0);
return PyBool_FromLong(!_PyLong_IsZero((PyLongObject *)arg));
}

if (PyTclObject_Check(arg)) {
Expand Down
Loading