Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check ak.firsts/ak.singletons semantics before the 2.0.0 release #1882

Closed
jpivarski opened this issue Nov 17, 2022 · 1 comment · Fixed by #1968
Closed

Check ak.firsts/ak.singletons semantics before the 2.0.0 release #1882

jpivarski opened this issue Nov 17, 2022 · 1 comment · Fixed by #1968
Assignees

Comments

@jpivarski
Copy link
Member

Inspired by #983 and CoffeaTeam/coffea#750, we need to take another look at ak.firsts and ak.singletons to make sure that they behave sensibly for all array types (e.g. that high-coverage suite of layouts copy-pasted from one v2 test to another) and add an axis parameter if it does not exist.

What firsts is supposed to do is: turn an array of var * X into an array of option[X] (of the same length) by returning the first item of each non-empty list and None for empty lists. ak.firsts(array) is like array[:, 0] except that it does not raise an exception if there are any empty lists. This has a natural extension to any axis and should work for lists buried within anything.

What singletons is supposed to do is: turn an array of option[X] into an array of var * X where each None maps to [] and each instance x of X maps to [x]. This can also be done at any axis.

The two functions are inverses of each other, exact inverses if firsts acts on lists of lengths 0 and 1 exclusively. These are the two ways of representing missing data: with a missingness token (None, which Awkward Array favors in all operations except this one) and with lists of length 0 and 1. (Option type is a monoid; I think only functional languages take this seriously.)

These two functions are pretty high in the count of number of uses across GitHub:

233 awkward.firsts(#)            <------
244 awkward.prod(#, axis=)
251 awkward.fromiter(#)
255 awkward.zip(#, with_name=)
255 awkward.singletons(#)        <------
264 awkward.all(#, axis=)
283 awkward.concatenate(#, axis=)
334 awkward.unflatten(#, #)
349 awkward.any(#, axis=)
368 awkward.fill_none(#, #)
388 awkward.sum(#)
422 awkward.count_nonzero(#)
446 awkward.Array(#)
473 awkward.where(#, #, #)
560 awkward.from_iter(#)
747 awkward.sum(#, axis=)
952 awkward.to_numpy(#)
1526 awkward.flatten(#)
1561 awkward.num(#)

and in the count of number of repos in which they were seen across GitHub:

13 awkward.singletons(#)        <------
13 awkward.local_index(#)
13 awkward.max(#)
13 awkward.count(#)
14 awkward.zip(#, depth_limit=)
14 awkward.from_parquet(#)
14 awkward.argsort(#, ascending=)
15 awkward.prod(#, axis=)
15 awkward.firsts(#)            <------
16 awkward.to_list(#)
17 awkward.max(#, axis=)
18 awkward.ones_like(#)
21 awkward.min(#, axis=)
22 awkward.combinations(#, #)
24 awkward.unzip(#)
24 awkward.all(#, axis=)
24 awkward.zeros_like(#)
24 awkward.all(#)
25 awkward.values_astype(#, #)
25 awkward.to_pandas(#)
25 awkward.broadcast_arrays(#, #)
26 awkward.count(#, axis=)
26 awkward.flatten(#, axis=)
28 awkward.concatenate(#)
28 awkward.behavior.update(#)
28 awkward.num(#, axis=)
34 awkward.concatenate(#, axis=)
35 awkward.zip(#, with_name=)
36 awkward.zip(#)
37 awkward.any(#, axis=)
38 awkward.unflatten(#, #)
38 awkward.sum(#)
38 awkward.where(#, #, #)
44 awkward.fill_none(#, #)
55 awkward.sum(#, axis=)
59 awkward.to_numpy(#)
63 awkward.num(#)
70 awkward.Array(#)
70 awkward.flatten(#)
@jpivarski jpivarski added the pr-next-release Required for the next release label Nov 17, 2022
@jpivarski
Copy link
Member Author

The only thing to do for next-release is to put warnings into the functions saying that the semantics will likely change. With the warning, this issue doesn't close, but the next-release label can be removed.

@jpivarski jpivarski self-assigned this Nov 30, 2022
@jpivarski jpivarski removed the pr-next-release Required for the next release label Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant