Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intersecting parameters introduced "unhashable type" bug. #1707

Closed
jpivarski opened this issue Sep 20, 2022 · 2 comments · Fixed by #1708
Closed

Intersecting parameters introduced "unhashable type" bug. #1707

jpivarski opened this issue Sep 20, 2022 · 2 comments · Fixed by #1708
Assignees
Labels
bug The problem described is something that must be fixed

Comments

@jpivarski
Copy link
Member

Version of Awkward Array

1.10.0

Description and code to reproduce

PR #1679 added parameter-carrying to broadcasting, but if the parameter values are dicts, rather something hashable, like strings, the implementation fails.

>>> import awkward._v2 as ak
>>> array = ak.with_parameter(ak.Array([[1, 2], [], [3]]), "attrs", "hashable")
>>> array
<Array [[1, 2], [], [3]] type='3 * [var * int64, parameters={"attrs": "hash...'>
>>> array + array
<Array [[2, 4], [], [6]] type='3 * [var * int64, parameters={"attrs": "hash...'>

versus

>>> array = ak.with_parameter(ak.Array([[1, 2], [], [3]]), "attrs", {"not": "hashable"})
>>> array
<Array [[1, 2], [], [3]] type='3 * [var * int64, parameters={"attrs": {"not...'>
>>> array + array
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jpivarski/mambaforge/lib/python3.10/site-packages/numpy/lib/mixins.py", line 21, in func
    return ufunc(self, other)
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/highlevel.py", line 1339, in __array_ufunc__
    return ak._v2._connect.numpy.array_ufunc(ufunc, method, inputs, kwargs)
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/_connect/numpy.py", line 278, in array_ufunc
    out = ak._v2._broadcasting.broadcast_and_apply(
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/_broadcasting.py", line 1019, in broadcast_and_apply
    out = apply_step(
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/_broadcasting.py", line 998, in apply_step
    return continuation()
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/_broadcasting.py", line 720, in continuation
    outcontent = apply_step(
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/_broadcasting.py", line 409, in apply_step
    parameters_factory = parameters_factory_impl(inputs)
  File "/Users/jpivarski/irishep/awkward/awkward/_v2/_broadcasting.py", line 277, in _intersection_parameters_factory
    functools.reduce(operator.and_, list(parameters_to_intersect))
TypeError: unhashable type: 'dict'

It fails in the functools.reduce of

intersected_parameters = None
parameters_to_intersect = []
# Build a list of set-like dict.items() views.
# If we encounter None-parameters, then we stop early
# as there can be no intersection.
for parameters in input_parameters:
if ak._v2.forms.form._parameters_is_empty(parameters):
break
else:
parameters_to_intersect.append(parameters.items())
# Otherwise, build the intersected parameter dict
else:
intersected_parameters = dict(
functools.reduce(operator.and_, parameters_to_intersect)
)

in the hashable case, the parameters_to_intersect is

[dict_items([('attrs', 'hashable')]), dict_items([('attrs', 'hashable')])]

and in the non-hashable case, it's

[dict_items([('attrs', {'not': 'hashable'})]), dict_items([('attrs', {'not': 'hashable'})])]

The operator.and_ is a bitwise & operator—I doubt that's what you want there. I think this can be done more simply in a for loop.

Also, possibly related, we're losing parameters in simple arrays:

>>> array = ak.with_parameter(ak.Array([1, 2, 3]), "attrs", {"not": "hashable"})
>>> array
<Array [1, 2, 3] type='3 * int64[parameters={"attrs": {"not": "hashable"}}]'>
>>> array + array
<Array [2, 4, 6] type='3 * int64'>

In this case, it reaches the break and therefore skips the reduce. (Why would _parameters_is_empty be true for this array that clearly has parameters? I think the first step in broadcasting creates a RegularArray around this NumpyArray—it doesn't collapse the RegularArray + NumpyArray into a multidimensional NumpyArray, and lose the parameter, does it?)

Discovered by @philippemiron.

@jpivarski jpivarski added the bug The problem described is something that must be fixed label Sep 20, 2022
@jpivarski
Copy link
Member Author

I've assigned it to you, @agoose77, so that you can take a first look (since you wrote #1679), but I can look at it too, with you, if you want.

@agoose77
Copy link
Collaborator

agoose77 commented Sep 20, 2022

but if the parameter values are dicts, rather something hashable, like strings, the implementation fails.

An oversight on my part.

The operator.and_ is a bitwise & operator—I doubt that's what you want there. I think this can be done more simply in a for loop.

Yes, in the case of set(), x & y computes the intersection of x and y

agoose77 added a commit that referenced this issue Sep 20, 2022
agoose77 added a commit that referenced this issue Sep 21, 2022
jpivarski pushed a commit that referenced this issue Sep 21, 2022
* fix: handle non-hashable parameters, fixes #1707

* test: validate fix for #1707

* refactor: move parameter util function to `form`

* refactor: result of `reduce` is already dict

* refactor: pre-determine set of keys to compare

* refactor: make parameter broadcasting public

* fix: (partial) broadcast parameters within ufunc mechanism

* fix: (tentative) don't attempt to convert `RegularArray`s into `NumpyArray`s during ufunc broadasting

This ensures that parameters remain on the correct layouts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants