-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: revert #1249 #1335
Fix: revert #1249 #1335
Conversation
This reverts commit a219c85.
Oh dear, this is still not right - we actually need the existing complex solution because of Union arrays, which can themselves contain options. |
I don't think it's necessary to revert #1249. It may be overly cautious, but it doesn't do anything beyond O(1) to do so. I'd just leave it be (since there's so much else that needs to be done...). |
@jpivarski right, and also we need this to support |
That's definitely something I hadn't considered. We don't necessarily need to support it in the sense of ensuring that every function can deal with it. We could, instead, expand the set of node-nestings that are considered invalid. The above could become one of the cases Arrow, for instance, doesn't consider "option-type of union-type" to be valid, which was a surprise to me because Arrow puts option-type on absolutely every node in their system (missing data is a node attribute, rather than a separate node). The argument is that if you want {x: None, y: None, z: None} is different from None it does not make a difference for unions: None # happens to be content 2, rather than content 0, but we don't know that is the same as None # for the whole union, all contents together We could define a rule that turns every
into a canonical form. The best optimum of memory and time (so, excluding BitMaskedArray) would be this:
where all of the UnionArray's Usually, we do option-type with IndexedOptionArray because it's the most general; you don't have to create fake data for missing values in RecordArrays (as Arrow does). In this case, a ByteMaskedArray makes more sense because the UnionArray's That would tend to make the This could be a new project, if you're interested. |
I'm going to move this to a new issue! |
This reverts #1249 (and adds the proper impl for v2) which solved a problem that shouldn't happen with Awkward-derived arrays.
Awkward should never produce arrays with nested option types, or index types, so we should not need to handle such cases when visiting layouts. This is an oversight on my part, which this PR will fix.