-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C API: Consider adding public PyLong_AsByteArray() and PyLong_FromByteArray() functions #111140
Comments
Thanks for creating the issue. I agree that the functions should be added. The current replacements seem awful for this kind of basic functionality. Going through an expensive Python call like this for converting between Note that at least a couple of projects that you list use Cython implemented parts and thus probably just mention the function in there. I'm sure something like |
It was already discussed several times. This API lacks some features which would be needed for general use. You need to know the size of the resulting bytes array. Calculating it is not trivial, especially for negative integers. Also, it would be core convenient to support "native" ending, not just "big"/"littel". I have been thinking about implementing a similar API for |
I'm not sure that passing the endian as a string is efficient if this function is part of hot code. |
Not as a string. Just 3-variant value native/little/big instead of boolean little/big. |
Sorry, I was confused between C API (int for the endian) and the Python API (string for the endian). |
Ok, then please put the existing function back in until there is a public replacement. |
I created PR #111162 |
Which features are missing? Do you have links to previous discussions if it was discussed previously? |
wcstombs() can be called with NULL buffer to compute the buffer size. It avoids to have to provide a second API "calculate the buffet size". I suppose that a common use case is also to convert a Python int object to C int type for which there is no C API. Like int128_t |
I suppose that a common use case is also to convert a Python int object to C int type for which there is no C API. Like int128_t
Or something like 256 bytes for a crypto key, hash value or similar data type. Probably a known size, so that users can live with an exception if it doesn't fit (because it's an error or can't-handle-that situation).
That said, a function to ask for the bit length of the integer value could be helpful in order to find a suitable integer/storage size. And also more generally to estimate the size of an integer value.
Both together would probably cover a lot of use cases.
|
I'm partial to API like
This allows handling the common cases with a minimum of function calls:
But, IMO we also need general API for exporting/importing Python objects to/from buffers (see e.g. #15 in capi-workgroup/problems), and it would be good to make this consistent. I'd prefer adding the original functions back until we design a proper replacement. |
Note that it goes together with an So, basically, the proposal is to add
It's not strictly related, though. I think a PyLong number is sufficiently different from an arbitrary Python object array to not require a highly similar interface. If it can be made similar, fine. I wouldn't hold up one for the other, though. Regarding Serhiy's concerns about missing ABI-stability of enum flags and arguments: we've used C macro names for this for ages, and they turn into simple integers that can be passed as C |
* gh-106320: Re-add _PyLong_FromByteArray(), _PyLong_AsByteArray() and _PyLong_GCD() to the public header files since they are used by third-party packages and there is no efficient replacement. See #111140 See #111139 * gh-111262: Re-add _PyDict_Pop() to have a C-API until a new public one is designed.
See also comments about removed _PyLong_New(): #108604 (comment) |
_PyLong_FromByteArray(), _PyLong_AsByteArray() and _PyLong_GCD() functions were restored by commit a8a89fc. |
Reopening, because I think at a minimum we should have the two functions mentioned in the title. My proposed API (I have an implementation, but not PR ready yet, and one open question) is basically the one Petr liked but simpler:
I'm comfortable making these only do default endianness, because they're really intended as an extension of all the other int conversions we have that also do default endianness. Alternate endianness is a specialised formatting or bit packing operation. The "designed size may be larger than strictly necessary" is to allow returning I envision the use here to be like this (note the EAFP):
Similarly for However, the bit I'm wavering on is what to do with unsigned values with the MSB set. Right now, you need to allow an extra byte to "prove" that
I don't think we have perf concerns at the point where this matters, as we're already at the extremes of a 64-bit integer (for most platforms). That's too big for a "compact" int, so we're on the slow path already. But I do want to get the usability right. I'm leaning towards a |
I'd prefer exposing both endiannness and signedness as arguments. As I see it, the functions should be intended for serialization too, not just for converting to native ints -- and in that case, it's best to be explicit. Perhaps we should use named flags, like:
|
All the scenarios I've seen have just been about converting to native ints (in contexts where serialization may happen next, but has to happen via a native int). Can you/anyone show me some where the caller doesn't want the int value, but just wants to store the bytes? (And doesn't want to/can't use the struct module, which is intended for this case.) FWIW, non-default endianness is inevitably a slow path. We can make this very fast for normal sized, native endian values, which are the vast majority of cases, but forcing an endianness has to slow things down. |
How about this as a proposed API:
Where the first two are essentially exported aliases that make it easier to read/write code without having to remember/write a set of flags every time? |
and
This is fine, but my counterpoint is that there's no other way to do it in our C API (and the way to do it in Python is to And I think the documentation for this makes the most sense framed as "behaves like |
I like the direction this is going, yes, that is the way I was hoping an |
…ve the test (python#115380) This expands the examples to cover both realistic use cases for the API. I noticed thing in the test that could be done better so I added those as well: We need to guarantee that all bytes of the result are overwritten and that too many are not written. Tests now pre-fills the result with data in order to ensure that. Co-authored-by: Steve Dower <[email protected]>
…ve the test (python#115380) This expands the examples to cover both realistic use cases for the API. I noticed thing in the test that could be done better so I added those as well: We need to guarantee that all bytes of the result are overwritten and that too many are not written. Tests now pre-fills the result with data in order to ensure that. Co-authored-by: Steve Dower <[email protected]>
…just *endianness* (pythonGH-116053)
See python/cpython#111140 Also clean up and simplify the fallback implementation, fixing some reference leaks along the way.
See python/cpython#111140 Also clean up and simplify the fallback implementation, fixing some reference leaks along the way.
The interface seems complete and usable now. Is this done now or is there anything left for this ticket to stay open? |
Looking things over I like the C API that what was settled upon. It seems to address all of the needs from our earlier discussions. |
…_PyLong_GCD() to the public header files since they are used by third-party packages and there is no efficient replacement. See python/cpython#111140 See python/cpython#111139
…e function signature of the `_PyLong_AsByteArray` API See python/cpython#111140 and capi-workgroup/decisions#31
…#111162) * pythongh-106320: Re-add _PyLong_FromByteArray(), _PyLong_AsByteArray() and _PyLong_GCD() to the public header files since they are used by third-party packages and there is no efficient replacement. See python#111140 See python#111139 * pythongh-111262: Re-add _PyDict_Pop() to have a C-API until a new public one is designed.
Feature or enhancement
The private
_PyLong_AsByteArray()
and_PyLong_FromByteArray()
functions were removed in Python 3.13: see PR #108429.@scoder asked what is the intended replacement for
_PyLong_FromByteArray()
.The replacement for
_PyLong_FromByteArray()
isPyObject_CallMethod((PyObject*)&PyList_Type, "from_bytes", "s#s", str, len, "big")
but I'm not sure what is the easy way to set the signed parameter to True (default:signed=False
).The replacement for
_PyLong_AsByteArray()
isPyObject_CallMethod(my_int, "to_bytes", "ns", length, "big")
. Same, I'm not sure how to easy set the signed parameter to True (default:signed=False
).I propose to add public PyLong_AsByteArray() and PyLong_FromByteArray() functions to the C API.
Python 3.12 modified PyLongObject: it's no longer a simple array of digits, but it's now a more less straightforward
_PyLongValue
structure which requires using unstable functions to access small "compact" values:So having a reliable and simple way to import/export a Python int object as bytes became even more important.
A code search for
_PyLong_AsByteArray
in PyPI top 5,000 projects found 12 projects using it:Linked PRs
The text was updated successfully, but these errors were encountered: