Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: meta must be an instance of an Awkward Array, not <class 'module'> triggered by fill_flattened for a dask-histogram and dask-awkward arrays #531

Open
NJManganelli opened this issue Jul 29, 2024 · 2 comments
Assignees

Comments

@NJManganelli
Copy link

Trying a hist.dask.Hist object's fill_flattened method with dask-awkward inputs triggers the error seen below. A concrete Hist with the computed awkward arrays as input succeeds. (Version is potentially wrong, I've installed most of scikit-hep packages as development versions, but all from github approximately 3 weeksago)

>>> import dask_awkward as dak
>>> import awkward as ak
>>> import hist
>>> from hist.dask import Hist
>>> d = dak.from_awkward(ak.Array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]), npartitions=2)
>>> e = d.compute()
>>> e
<Array [0.1, 0.2, 0.3, 0.4, 0.5, ..., 0.7, 0.8, 0.9, 1] type='10 * float64'>
>>> dd = dak.from_awkward(ak.Array([[1, 2], [3], [4], [5,6, 7], [8, 4], [2, 3], [1], [], [5, 9, 10], [5, 1]]), npartitions=2)
>>> ak.num(d.compute(), axis=0)
array(10)
>>> ak.num(dd.compute(), axis=0)
array(10)
>>> hd = Hist(hist.axis.Regular(10, 0, 10), hist.storage.Weight())
>>> hd.fill_flattened(dd, weight=d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/nmangane/scikit-hep-dev/hist/src/hist/basehist.py", line 274, in fill_flattened
    destructured = interop.destructure(arg)
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/hist/src/hist/interop.py", line 65, in destructure
    for module in find_histogram_modules(obj):
  File "/Users/nmangane/scikit-hep-dev/hist/src/hist/interop.py", line 48, in find_histogram_modules
    yield arg._histogram_module_
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/dask-awkward/src/dask_awkward/lib/core.py", line 1596, in __getattr__
    return self.map_partitions(
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/dask-awkward/src/dask_awkward/lib/core.py", line 1649, in map_partitions
    return map_partitions(func, self, *args, traverse=traverse, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/dask-awkward/src/dask_awkward/lib/core.py", line 2168, in map_partitions
    return _map_partitions(
           ^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/dask-awkward/src/dask_awkward/lib/core.py", line 2036, in _map_partitions
    return new_array_object(
           ^^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/dask-awkward/src/dask_awkward/lib/core.py", line 1849, in new_array_object
    raise TypeError(
TypeError: meta must be an instance of an Awkward Array, not <class 'module'>.

>>> he = hist.hist.Hist(hist.axis.Regular(10, 0, 10), hist.storage.Weight())
>>> he.fill_flattened(dd.compute(), weight=d.compute())
Hist(Regular(10, 0, 10, label='Axis 0'), storage=Weight()) # Sum: WeightedSum(value=8.6, variance=5.96) (WeightedSum(value=9.5, variance=6.77) with flow)
>>> import dask_histogram
>>> dask_histogram.__version__
'2024.3.0'
@agoose77 agoose77 self-assigned this Jul 30, 2024
@agoose77
Copy link
Collaborator

agoose77 commented Jul 30, 2024

This error derives from the fact that we don't implement support for the fill-flattened method in dask-awkward.

A rough implementation looks like:

import awkward as ak
import dask_awkward as dak
class _DaskAwkwardHistModule:
    def unpack(array):
        if not ak.fields(array):
            return None
        return dict(zip(ak.fields(array), ak.unzip(array)))
    def broadcast_and_flatten(args):
        new_args = []
        for arg in args:
            if isinstance(arg, dak.Array):
                new_args.append(arg)
            elif isinstance(arg, dask.Array):
                new_args.append(dak.from_dask_array(arg))
            else:
                new_args.append(dak.from_awkward(ak.Array(arg, backend="numpy")))
        assert not any([x.fields for x in new_args])
        return tuple(
          ak.flatten(x, axis=None) for x in ak.broadcast_arrays(*new_args)
        )

dak.Array._histogram_module_ = _DaskAwkwardHistModule

I'm not sure how smart that will be with non-equally-partitioned collections.

@martindurant
Copy link
Collaborator

@agoose77 , were you thinking to submit that? We could put a restriction on partitions, and reasonable messaging if that's not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants