Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update groupby to return the key and the group, similar to itertools and pandas #225

Open
znicholls opened this issue Apr 3, 2023 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@znicholls
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Currently, our groupby method doesn't return the keys of the groups. This is at odds with other packages (e.g. pandas and itertools) and also forcers developers to get the keys within loops (which is fine, but a bit more annoying and less slick).

Describe the solution you'd like

We change the API from

for group in run.groupby(groups):
    # User has to get key themselves
    key = [group.get_unique_meta(g, True) for g in groups]

to

# more standard API
for key, group in run.groupby(groups):

Describe alternatives you've considered

Leave thing as is

Additional context

This would be a major breaking change, so we would probably want to roll it out with a keyword argument to control behaviour and a deprecation warning for at least two minor releases before making the change.

@znicholls znicholls added enhancement New feature or request good first issue Good for newcomers labels Apr 3, 2023
@lewisjared
Copy link
Collaborator

What is the type of key in this case? My view is that it needs to be something that include the column names such as apd.Series or dict. I think that pandas returns the keys as a list if a MultiIndex is used, which isn't very useful as it requires knowing the group keys. This pandas behaviour needs to be confirmed

@znicholls
Copy link
Collaborator Author

znicholls commented Apr 3, 2023

What is the type of key in this case?

Tuple of strings I believe.

My view is that it needs to be something that include the column names such as apd.Series or dict

I'm not sure why, but neither pandas nor itertools does it that way. I guess the idea is that, if you really want that, you can do

for key, group in thing.groupby(groups):
    group_into = {group_name: k for group_name, k in zip(groups, key)}

I think that pandas returns the keys as a list if a MultiIndex is used, which isn't very useful as it requires knowing the group keys

I think the idea is that you always know keys, because the user makes the call to groupby in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants