-
Notifications
You must be signed in to change notification settings - Fork 86
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
2e96fb6
commit 324e284
Showing
1 changed file
with
1 addition
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
324e284
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the source code. In _mpfr_hash,
f
is just a pointer to anmpfr_t
. It is not reference Python object. We really need to continue to use_Py_HashPointer
.I'm getting frustrated by the continual changes in CPython. The code under discussion is only called when the value is
nan
. There is no way to pass that concept toPyBaseObject_Type.tp_hash
.If they won't continue to provide, I'd rather implement it directly in gmpy2. Untested. Probably needs another review of the various types involved
Py_hash_t
GMPy_HashPointerNan(const void p)
{
Py_hash_t y = (size_t)p;
/ bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
if (y == -1) {
y = -2;
}
return y;
}
324e284
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casevh Yes, but this shouldn't be a problem, unless hash implementation for the object type
someday will do an attempt to dereference the pointer.
Actually, there is: e.g. we could inline the _mpfr_hash() on L159 and pass the pointer
to self argument (of type MPFR_Object) to
object.__hash__()
. But as I said: this doesn't matter,as the actual PyBaseObject_Type.tp_hash implementation wants a void pointer in the end... This
is an implementation detail of the CPython, but I doubt it will be changed.
On another hand, the code (not interface) for
_Py_HashPointer()
might change easily, e.g.:python/cpython@f453221
324e284
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyBaseObject_Type.tp_hash
does attempt to dereference the pointer and will attempt to the call the hash method of the Type (if found) or it will raise HashNotImplemented. But we are not trying to call a generic hash function.Prior to Python 3.10, hash(nan) would always return a value of 0. Since nan's always compare unequal, the bucket (??) that collects the hash value of 0 would contain all the references to actual numeric instances that compare equal to 0 and all numeric instances that are NaN and therefore don't compare equal to anything. This could lead to slowdowns for comparisons. In Python 3.10 and later, hash(nan) was changed to return a pseudo-random number derived from a pointer. And that is the concept that we can't pass to
PyBaseObject_Type.tp_hash
. We aren't trying to calculate an object.hash(). We are in the middle of calculating the hash already. And to determine the hash of a nan, all we need is a pseudo-random number to use in place of 0. The function that creates that pseudo-random value in an attempt to decrease collisions is _Py_HashPointer.In this specific case, we need to use _Py_HashPointer.
324e284
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This slot is just equal to
_Py_HashPointer
:https://github.com/python/cpython/blob/970e719a7a829bddc647bbaa668dd8603abdddef/Objects/typeobject.c#L6586
That is correct for description of the builtin hash() function. But we don't call
this builtin, but the
object.__hash__()
directly.We have ifdefs for this legacy stuff.
We can. And we do so far. My point above was that the
PyBaseObject_Type.tp_hash
doesn'tcare about pointer type, so algorithm for derivation of a pseudo-random number from a pointer
(we just pass mpfr_t instead of void*) will work (that's on the current master or #453, with py3.12):