-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add public function PyLong_GetDigits() #31
Comments
I suggest a different API: |
This assumes some implementation details, which we could not change in the future:
|
With that on the table, please consider also second variant (de-facto two functions already exist as private): /* Return number of digits. */
Py_ssize_t PyUnstable_PyLong_DigitCount(PyLongObject *obj);
/* Return array of digits. */
digit* PyUnstable_PyLong_GetDigits(PyLongObject *obj);
/* Return a new integer object with unspecified absolute value of given
size (in digits) and with a given sign. On failure return NULL. */
PyLong_Object* PyUnstable_PyLong_New(const Py_ssize_t ndigits, const int sign); But that looks as a half-solution for me.
That's true. On another hand, I'm not sure how essential freedom to change these details. The "big integer" - is just an array of "digits" ("limbs" in the GMP terminology). That seems to be a common denominator for all implementations of arbitrary precision integer arithmetic. Also, I would expect that most future enhancements for CPython int's will be around "small" integers, that fit into some machine-size integer. (I.e. not implementation of new algorithms for arithmetic with better asymptotic properties (beyond Karatsuba). For the later - I would prefer a switch to some production quality library, like GMP.) We could mention, that for "small" integers it's better to use dedicated
Yes, this is assumed in this proposal. But we can avoid this by offering some function, that will provide such data. E.g. (like python/cpython#102471 (comment)): typedef struct _PyIntExportLayout {
uint8_t bits_per_digit,
int8_t word_endian,
int8_t array_endian,
uint8_t digit_size,
} PyIntExportLayout;
void Py_GetIntLayout(PyIntExportLayout *layout); The |
Yes. But not all integers are big. For compact ints larger than a digit (e.g. 32 bits set in a native int, with 30-bit digits), the proposed API can't work.
This still assumes that all ints share the same layout.
This assumes that CPython (or another implementation that provides the C API) won't ever switch to a production-quality bigint library. Conisder something similar to what I proposed for
If the user guesses the underlying data format correctly, they get a zero-copy shared buffer, and “release” is a flag check + decref. If the guess is wrong, they get a newly allocated buffer -- slower but still correct.
The downside is, of course, that it's harder to implement on CPython's side. |
I'd prefer an API shaped like Our digits don't currently fill a multiple of 8 bits, which means we would need to repack them to provide a contiguous buffer. At that point, But the critical things for us are that we shouldn't do any conversions from the internal representation, and we are able to change the representation in the future (potentially by rejecting calls to the API and letting the caller use the main functions). The |
It seems like different things should be decided / designed:
I suggest to rename this function to For example, for small integer, a small array for the single digit can be allocated. Another example, the export function can store a strong reference to the Python int object, to make sure that it remains valid while the export is being used. The release function would just DECREF its refcount. For the "layout", maybe it should be constant, as Mark proposed, rather than set at each call. Something like:
I think that we should think about the opposite "import" function directly to have a consistent API. It can be something like:
|
But this is a safe assumption for non-compact values, right? I think it's fine if
Was such extension of the
Again, probably it will be true for "big enough" ints. Proposed API (in both variants) actually dedicated to this case.
Seems much more complex approach, just to support small ints. Do we really need to pay this price?
But that means breaking ABI if we change PyLong_NATIVE_LAYOUT someday, isn't?
Well, referenced Mark's issue has proposal for mpz_import/export-like functions. That might be an alternative, but this requires much more efforts from the CPython side. This proposal instead leave most work to numeric libraries. From the CPython we just offer access to the absolute value of "big enough" integers as an array of "digits" with given layout (all required to fill |
The C API is implemented by other Python implementations which can use a different layout, have different constraints, and so need to do different things on Export + ReleaseExport.
ABI change would mean that the ABI of the PyLongLayout structure would change. But its values are free to change between two Python versions, no? |
I agree, but I also want to point out that I proposed an API scheme that covers all of these cases and would mean we can add them much more cheaply. If we're going to start adding many specialised APIs of this style, I'd like us to consider just adding a general API for it, so each new "export struct" can just be an extension of it rather than an entirely new API each time. |
Hmm, I think so:) That certainly fits the scope of the current proposal. But can we provide a writable buffers for integers (which supposed to be immutable)? I would like to streamline also conversion from GMP-like libraries.
The layout argument here is redundant too, unless we want to do
That's true, but difference in the layout is easy to handle (if we have API to query it like Other differences might be related with the case of "small" integers, where maybe someone (or CPython someday) will use a different structure for longobject (i.e. not a heterogeneous array of "digits"). I'm not sure if it worth to support that case with new API. Why not fail in this case as fast as possible, then users can switch to API for small integers? Consider: static void
mpz_set_PyLong(mpz_t z, PyObject *obj)
{
Py_ssize_t len;
const digit *digits = PyLong_GetDigits(obj, &len);
if (!digits) {
/* Assuming obj is an integer, above call *might* fail for small
integers. We catch this and use special functions. */
mpz_set_si(z, PyLong_AsLong(obj));
}
else {
/* Else, we have "array of digits" view with some layout, which
may vary with implementation. */
PyIntExportLayout layout;
Py_GetIntLayout(&layout);
mpz_import(z, len, layout.array_endian, layout.digit_size, layout.word_endian,
layout.digit_size*8 - layout.bits_per_digit, digits);
int sign = 1;
PyLong_GetSign(obj, &sign);
if (sign < 0) {
mpz_neg(z, z);
}
}
} |
Speaking as one indiviudual member of the WG:
No, not directly. Perhaps something along the lines of: PyLongBuilder_New(PyLongLayout, size_t, void **);
void *data_to_fill;
PyLongBuilder *b = PyLongBuilder_New(layout, count, &data_to_fill);
memcpy(data_to_fill, your_data, count);
PyObject *obj = PyLongBuilder_Finish(b);
// If "layout" happens to match CPython's internal layout for this kind of int,
// `PyLongBuilder_Finish` fills in a PyObject/PyLong header and casts to PyObject*.
// Otherwise, it allocates a new object and copies the data to it.
This assumes Perhaps something like: Py_ssize_t len;
PyIntExportLayout layout;
void *data = PyLong_Export(obj, &len, &layout); For importing, maybe a function to get the layout to pass to the above int PyLong_GetLayout(Py_ssize_t bit_length, int sign, PyIntExportLayout *result);
Each type of “exportable” object needs a different set of “layout” arguments. Strings need an encoding, ints need the 4 variables mentioned in this thread, sequences/iterables will perhaps need a chunk size... Meta-discussion: we're brainstorming rather than making decisions; I think Discourse would be a better place for this kind of conversation. |
I wrote the PR python/cpython#121339 to implement a whole "import-export" API for Python int objects. I suggest to move the discussion there, and once we agree on an API, we can come back here. |
I created #35 for this API. |
…e function signature of the `_PyLong_AsByteArray` API See python/cpython#111140 and capi-workgroup/decisions#31
@skirpichev: Since PEP 757 has been written, is this API still relevant? |
Only for history. |
That will allow fast import of CPython integers if external multiple precision library support
mpz_import()
-like interface (e.g. LibTomMath hasmp_unpack()
). Currently gmpy2 (see this or Sage (see this) use private API to do this. Using newPyLong_FromNativeBytes()
/PyLong_AsNativeBytes()
will be a performance regression for such projects.Proposed interface:
API usage:
Above interface resembles combined GMP's functions
mpz_size()
andmpz_limbs_read()
. It might worth to consider also export from external multiple precision libraries, that offermpz_export()
-like API. gmpy2 does this using private API to access digits/digit count and also the_PyLong_New()
to allocate an integer of appropriate size.So, alternative interface to support both reading and writing might look like this:
The
PyLong_New()
shouldn't violate #56: freshly created integer will be a correct one, just with an unspecified absolute value.The text was updated successfully, but these errors were encountered: