Skip to content

Commit

Permalink
Fix compiler warning
Browse files Browse the repository at this point in the history
  • Loading branch information
skirpichev committed Nov 2, 2023
1 parent 2e96fb6 commit 324e284
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/gmpy2_hash.c
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ _mpfr_hash(mpfr_t f)
}
else {
#if PY_VERSION_HEX >= 0x030A00A0
return PyBaseObject_Type.tp_hash(f);
return PyBaseObject_Type.tp_hash((PyObject*)f);
#else
return _PyHASH_NAN;
#endif
Expand Down

4 comments on commit 324e284

@casevh
Copy link
Collaborator

@casevh casevh commented on 324e284 Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the source code. In _mpfr_hash, f is just a pointer to an mpfr_t. It is not reference Python object. We really need to continue to use _Py_HashPointer.

I'm getting frustrated by the continual changes in CPython. The code under discussion is only called when the value is nan. There is no way to pass that concept to PyBaseObject_Type.tp_hash.

If they won't continue to provide, I'd rather implement it directly in gmpy2. Untested. Probably needs another review of the various types involved

Py_hash_t
GMPy_HashPointerNan(const void p)
{
Py_hash_t y = (size_t)p;
/
bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
if (y == -1) {
y = -2;
}
return y;
}

@skirpichev
Copy link
Contributor Author

@skirpichev skirpichev commented on 324e284 Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the source code. In _mpfr_hash, f is just a pointer to an mpfr_t. It is not reference Python object.

@casevh Yes, but this shouldn't be a problem, unless hash implementation for the object type
someday will do an attempt to dereference the pointer.

There is no way to pass that concept to PyBaseObject_Type.tp_hash

Actually, there is: e.g. we could inline the _mpfr_hash() on L159 and pass the pointer
to self argument (of type MPFR_Object) to object.__hash__(). But as I said: this doesn't matter,
as the actual PyBaseObject_Type.tp_hash implementation wants a void pointer in the end... This
is an implementation detail of the CPython, but I doubt it will be changed.

On another hand, the code (not interface) for _Py_HashPointer() might change easily, e.g.:
python/cpython@f453221

@casevh
Copy link
Collaborator

@casevh casevh commented on 324e284 Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyBaseObject_Type.tp_hash does attempt to dereference the pointer and will attempt to the call the hash method of the Type (if found) or it will raise HashNotImplemented. But we are not trying to call a generic hash function.

Prior to Python 3.10, hash(nan) would always return a value of 0. Since nan's always compare unequal, the bucket (??) that collects the hash value of 0 would contain all the references to actual numeric instances that compare equal to 0 and all numeric instances that are NaN and therefore don't compare equal to anything. This could lead to slowdowns for comparisons. In Python 3.10 and later, hash(nan) was changed to return a pseudo-random number derived from a pointer. And that is the concept that we can't pass to PyBaseObject_Type.tp_hash. We aren't trying to calculate an object.hash(). We are in the middle of calculating the hash already. And to determine the hash of a nan, all we need is a pseudo-random number to use in place of 0. The function that creates that pseudo-random value in an attempt to decrease collisions is _Py_HashPointer.

In this specific case, we need to use _Py_HashPointer.

@skirpichev
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyBaseObject_Type.tp_hash does attempt to dereference the pointer

No. This slot is just equal to _Py_HashPointer:
https://github.com/python/cpython/blob/970e719a7a829bddc647bbaa668dd8603abdddef/Objects/typeobject.c#L6586

and will attempt to the call the hash method of the Type (if found) or it will raise HashNotImplemented.

That is correct for description of the builtin hash() function. But we don't call
this builtin, but the object.__hash__() directly.

Prior to Python 3.10, hash(nan) would always return a value of 0.

We have ifdefs for this legacy stuff.

derived from a pointer. And that is the concept that we can't pass to PyBaseObject_Type.tp_hash

We can. And we do so far. My point above was that the PyBaseObject_Type.tp_hash doesn't
care about pointer type, so algorithm for derivation of a pseudo-random number from a pointer
(we just pass mpfr_t instead of void*) will work (that's on the current master or #453, with py3.12):

>>> a = float('nan')
>>> b = float('nan')
>>> hash(a)
8792410105983
>>> hash(b)
8792410106311
>>> from gmpy2 import mpfr
>>> a = mpfr('nan')
>>> b = mpfr('nan')
>>> hash(a)
8792410515796
>>> hash(b)
8792409903548

Please sign in to comment.