Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

All awkward types should have a "regular" method #111

Closed
jpivarski opened this issue Mar 26, 2019 · 1 comment
Closed

All awkward types should have a "regular" method #111

jpivarski opened this issue Mar 26, 2019 · 1 comment

Comments

@jpivarski
Copy link
Member

In all cases, AwkwardArray.regular should turn the awkward array into a plain ol' Numpy array, if possible. JaggedArray already has such a method, but it would also be useful in UnionArray when all contents are Numpy.

This is related to #84, the request for an awkward.where(flatbool, a, b) method that takes a[i] from a if flatbool[i] and b[i] from b otherwise. Such a function would always return a UnionArray except for some specialized cases if we choose to single them out and implement them. The simple implementation of where would be:

def where(condition, x, y):
    assert len(condition) == len(x) == len(y)
    return UnionArray(condition.astype(UnionArray.TAGTYPE),  # bools -> tags
                      numpy.arange(len(x)),                  # trivial indexes
                      [y, x])                                # false -> 0, which is y

which basically just delays the application of the condition. Delayed evaluation is necessary because x and y might be too complex to mix, even if they are the same type. (That's why specialized cases may be handled differently, but probably shouldn't for regularity.) This isn't a problem for analysts: you can do all the

a["pt"][:, 0]

stuff on a UnionArray just like any jagged table or whatever.

But then, to make this useful, you need a way of eventually evaluating the where, once you've broken down the structure to something that can be pure Numpy. UnionArray.regular can be

def regular(self):
    if not all(isinstance(x, self.numpy.ndarray) for x in self._contents):
        raise TypeError("you stink!")
    out = self.numpy.empty(len(self.index), dtype=self.dtype)  # see UnionArray.dtype
    for tag, content in enumerate(self._contents):
        out[self._tags == tag] = content[self.index[self._tags == tag]]
    return out

which is a more-than-two version of numpy.where.

@jpivarski
Copy link
Member Author

Implemented in PR #142, but the regular methods don't all have unit tests yet.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant